1 EWAS introduction

DNA methylation quantity is expressed as \(\beta\)

\[ \beta = M/(M + U) \]

Where M = hybridization signal from a methylated version of a cytosine nucleotide and U = hybridization signal from an unmethylated version of a cytosine nucleotide. Beta can be interpreted as the proportion of methylation signal for a probe, and values range from 0 to 1. Beta is easy to interpret for humans, but typically has a bimodal distribution that is suboptimal for statistical modeling. Thus, we analyze M-values, which are another way to express methylation values for probes.

\[ M-value = log_2(M/U) \]

EPIC array has type 1 and type 2 probes. Type 1 probes are from the old 27K array that uses 2 bead types per CpG. The type I probes are labeled I-green and I-red. The newer type 2 probes uses one bead type. Most of the EPIC array probes are type II.

A detection probability represents the probability of a detected signal being background flourescence. If the probability is high, the signal is more likely to be background, and the value should be set to missing.

Standard workflows suggest to remove data points with detection P-value > 0.05.

2 Sesame package processing of Illumina EPIC array

2.1 Sesame package intro

  • EWAS array Illumina EPIC 850

  • Sesame is a bioconductor package

    • Improvements on previous EWAS packages for low-level processing.
    • Existing methods do not identify artifacts associated with detection failure. Sources of failure include: insufficient DNA due to germline or somatic deletions or hyperpolymorphism, probe cross-hybridization.
    • Probes are masked (set to NA) with two methods: P-value with out-of-band array hybridization (pOOBAH) so that probes with detection p-value > 0.05 are masked. Probes with design issues such as overlapping SNPs are masked. Minfi uses negative control probes for background, but there are only 411 of them on the EPIC array. Sesame pOOBAH uses out-of-band (OOB) signals from type I probes in addition to negative control probes to improve background subtraction and detection calling. Minfi adopted pOOBAH, but false positive rate higher than Sesame’s implementation (Fig 2 Zhou 2018, NAR).
    • Non-linear dye bias correction. Minfi uses linear scaling. Sesame uses nonlinear scaling to better fit dye bias.
knitr::opts_chunk$set(cache.lazy = FALSE)

2.2 Sesame installation on UCR cluster

Installation of sesame and dependencies went fine, no errors or warnings.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("sesame", lib = "~/Rlibs")

Installed sesameData as a dependency.

Can check version of loaded packages with sessionInfo()

Sesame version 1.6 used.

2.3 Load libraries

This includes libraries for Sesame and Minfi.

library(tidyverse)
library(readxl)
library(knitr)
library(sesame)
library(wheatmap)
library(multtest)
library(limma)
library(RColorBrewer)
library(minfi)
library(IlluminaHumanMethylationEPICmanifest)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)

Set global variables.

dir_out <- "../results/"

2.4 Mask and normalize data and create eset

Sesame pipeline involves probe masking with pOOBAH, normalization with noob, and nonlinear dye bias correction.

Noob = background subtraction based on normal-exponential deconvolution using out-of-band probes.

The pipeline can be thought of as adding pOOBAH to noob. Then also the non-linear dye bias correction should be better than minfi’s linear correction.

OpenSesame takes 12 minutes, so run it once and save the output.

IDATprefixes <- searchIDATprefixes(dir.name = "../data/raw/idatFiles/")
t1 <- Sys.time()
betas <- openSesame(IDATprefixes)
Sys.time() - t1
## Time difference of 11.92549 mins
#Warnings issued. These are also shown in minfi vignette, so they are expected. 
#50: In readChar(con, nchars = n) : truncating string with embedded nuls

#Convert Betas to Mvalues. Check if there are beta values that will result in infinite results.
sum(is.na(betas))
## [1] 4462814
sum(is.na(betas))/length(betas)
## [1] 0.1610579
sum(betas==0 & !is.na(betas))
## [1] 0
sum(betas==1 & !is.na(betas))
## [1] 0
sum(betas<0 & !is.na(betas))
## [1] 0
sum(betas>1 & !is.na(betas))
## [1] 0
Mvals <- BetaValueToMValue(betas)

#column names for betas and Mvals is currently the Basename. Rename it to sampleID to make it easier to understand.
sampsheet <- read_csv("../data/raw/idatFiles/SampleSheet.csv")
#Are there true discrepancies between ExternalSampleID, SampleLabel, and Description?
#row 2 ExternalSampleID shows "+P" but SampleLabel shows "-P" and Description shows "no pyruvate". Many discrepancies like that. I ended up trusting SampleLabel and Description, but is that wrong? 
cbind(sampsheet$ExternalSampleID, sampsheet$SampleLabel, sampsheet$Description)
##       [,1]       [,2]       [,3]                                                       
##  [1,] "C -P 1"   "C -P 1"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [2,] "C +P 1"   "C -P 2"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [3,] "mtD -P 1" "C -P 3"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [4,] "mtD +P 1" "C -P 4"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [5,] "C -P 2"   "C -P 5"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [6,] "C +P 2"   "C -P 6"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [7,] "mtD -P 2" "C +P 1"   "Non-senescent mtDNA-containing controls with pyruvate"    
##  [8,] "mtD +P 2" "C +P 2"   "Non-senescent mtDNA-containing controls with pyruvate"    
##  [9,] "C -P 3"   "C +P 3"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [10,] "C +P 3"   "C +P 4"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [11,] "mtD -P 3" "C +P 5"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [12,] "mtD +P 3" "C +P 6"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [13,] "C -P 4"   "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate"               
## [14,] "C +P 4"   "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate"               
## [15,] "mtD -P 4" "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate"               
## [16,] "mtD +P 4" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"               
## [17,] "C -P 5"   "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate"               
## [18,] "C +P 5"   "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate"               
## [19,] "mtD -P 5" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "mtD +P 5" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "C -P 6"   "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "C +P 6"   "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "mtD -P 6" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "mtD +P 6" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] "- ATV 1"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
## [26,] "+ ATV 1"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
## [27,] "- ATV 2"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
## [28,] "+ ATV 2"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
## [29,] "- ATV 3"  "+ ATV"    "Senescent ATV-treated cells"                              
## [30,] "+ ATV 3"  "+ ATV"    "Senescent ATV-treated cells"                              
## [31,] "- ATV 4"  "+ ATV"    "Senescent ATV-treated cells"                              
## [32,] "+ ATV 4"  "+ ATV"    "Senescent ATV-treated cells"
sum(colnames(betas) != sampsheet$Basename) #not in same order
## [1] 16
cbind(colnames(betas),sampsheet$Basename) #visually show not in same order
##       [,1]                  [,2]                 
##  [1,] "203833040074_R01C01" "203839170002_R01C01"
##  [2,] "203833040074_R02C01" "203839170002_R02C01"
##  [3,] "203833040074_R03C01" "203839170002_R03C01"
##  [4,] "203833040074_R04C01" "203839170002_R04C01"
##  [5,] "203833040074_R05C01" "203839170002_R05C01"
##  [6,] "203833040074_R06C01" "203839170002_R06C01"
##  [7,] "203833040074_R07C01" "203839170002_R07C01"
##  [8,] "203833040074_R08C01" "203839170002_R08C01"
##  [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203839170002_R01C01" "203833040074_R01C01"
## [26,] "203839170002_R02C01" "203833040074_R02C01"
## [27,] "203839170002_R03C01" "203833040074_R03C01"
## [28,] "203839170002_R04C01" "203833040074_R04C01"
## [29,] "203839170002_R05C01" "203833040074_R05C01"
## [30,] "203839170002_R06C01" "203833040074_R06C01"
## [31,] "203839170002_R07C01" "203833040074_R07C01"
## [32,] "203839170002_R08C01" "203833040074_R08C01"
cbind(colnames(betas),sampsheet$Basename[match(sampsheet$Basename, colnames(betas))])#now in correct order
##       [,1]                  [,2]                 
##  [1,] "203833040074_R01C01" "203833040074_R01C01"
##  [2,] "203833040074_R02C01" "203833040074_R02C01"
##  [3,] "203833040074_R03C01" "203833040074_R03C01"
##  [4,] "203833040074_R04C01" "203833040074_R04C01"
##  [5,] "203833040074_R05C01" "203833040074_R05C01"
##  [6,] "203833040074_R06C01" "203833040074_R06C01"
##  [7,] "203833040074_R07C01" "203833040074_R07C01"
##  [8,] "203833040074_R08C01" "203833040074_R08C01"
##  [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203839170002_R01C01" "203839170002_R01C01"
## [26,] "203839170002_R02C01" "203839170002_R02C01"
## [27,] "203839170002_R03C01" "203839170002_R03C01"
## [28,] "203839170002_R04C01" "203839170002_R04C01"
## [29,] "203839170002_R05C01" "203839170002_R05C01"
## [30,] "203839170002_R06C01" "203839170002_R06C01"
## [31,] "203839170002_R07C01" "203839170002_R07C01"
## [32,] "203839170002_R08C01" "203839170002_R08C01"
#reorder sampsheet to match sample order in data matrix
sampsheet <- sampsheet[match(sampsheet$Basename, colnames(betas)),]
sampsheet <- dplyr::rename(sampsheet, sampleID = ExternalSampleID)
cbind(sampsheet$sampleID, sampsheet$SampleLabel, sampsheet$Description)
##       [,1]       [,2]       [,3]                                                       
##  [1,] "- ATV 1"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [2,] "+ ATV 1"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [3,] "- ATV 2"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [4,] "+ ATV 2"  "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [5,] "- ATV 3"  "+ ATV"    "Senescent ATV-treated cells"                              
##  [6,] "+ ATV 3"  "+ ATV"    "Senescent ATV-treated cells"                              
##  [7,] "- ATV 4"  "+ ATV"    "Senescent ATV-treated cells"                              
##  [8,] "+ ATV 4"  "+ ATV"    "Senescent ATV-treated cells"                              
##  [9,] "C -P 3"   "C +P 3"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [10,] "C +P 3"   "C +P 4"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [11,] "mtD -P 3" "C +P 5"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [12,] "mtD +P 3" "C +P 6"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [13,] "C -P 4"   "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate"               
## [14,] "C +P 4"   "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate"               
## [15,] "mtD -P 4" "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate"               
## [16,] "mtD +P 4" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"               
## [17,] "C -P 5"   "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate"               
## [18,] "C +P 5"   "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate"               
## [19,] "mtD -P 5" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "mtD +P 5" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "C -P 6"   "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "C +P 6"   "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "mtD -P 6" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "mtD +P 6" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] "C -P 1"   "C -P 1"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [26,] "C +P 1"   "C -P 2"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [27,] "mtD -P 1" "C -P 3"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [28,] "mtD +P 1" "C -P 4"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [29,] "C -P 2"   "C -P 5"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [30,] "C +P 2"   "C -P 6"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [31,] "mtD -P 2" "C +P 1"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [32,] "mtD +P 2" "C +P 2"   "Non-senescent mtDNA-containing controls with pyruvate"
sampsheet$sampleID[1:8] <- paste0("ATV", 1:8)
sampsheet$sampleID[9:32] <- paste0("midas", 1:24) 
cbind(sampsheet$sampleID, sampsheet$SampleLabel, sampsheet$Description)
##       [,1]      [,2]       [,3]                                                       
##  [1,] "ATV1"    "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [2,] "ATV2"    "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [3,] "ATV3"    "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [4,] "ATV4"    "- ATV"    "Non-senescent DMSO-treated control cells"                 
##  [5,] "ATV5"    "+ ATV"    "Senescent ATV-treated cells"                              
##  [6,] "ATV6"    "+ ATV"    "Senescent ATV-treated cells"                              
##  [7,] "ATV7"    "+ ATV"    "Senescent ATV-treated cells"                              
##  [8,] "ATV8"    "+ ATV"    "Senescent ATV-treated cells"                              
##  [9,] "midas1"  "C +P 3"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [10,] "midas2"  "C +P 4"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [11,] "midas3"  "C +P 5"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [12,] "midas4"  "C +P 6"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [13,] "midas5"  "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate"               
## [14,] "midas6"  "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate"               
## [15,] "midas7"  "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate"               
## [16,] "midas8"  "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"               
## [17,] "midas9"  "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate"               
## [18,] "midas10" "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate"               
## [19,] "midas11" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "midas12" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "midas13" "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "midas14" "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "midas15" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "midas16" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] "midas17" "C -P 1"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [26,] "midas18" "C -P 2"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [27,] "midas19" "C -P 3"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [28,] "midas20" "C -P 4"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [29,] "midas21" "C -P 5"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [30,] "midas22" "C -P 6"   "Non-senescent mtDNA-containing controls no pyruvate"      
## [31,] "midas23" "C +P 1"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [32,] "midas24" "C +P 2"   "Non-senescent mtDNA-containing controls with pyruvate"
translate_trt <- function(ch){
  switch(ch,
     "Non-senescent DMSO-treated control cells" = "noSen_ATV",
     "Non-senescent mtDNA-containing controls no pyruvate" = "noSen_Mt_noPy", 
     "Non-senescent mtDNA-containing controls with pyruvate" = "noSen_Mt_Py",
     "Non-senescent mtDNA-depleted controls cells with pyruvate" = "noSen_noMt_Py",
     "Senescent ATV-treated cells" = "sen_ATV",
     "Senescent mtDNA-depleted cells no pyruvate" = "sen_noMt_noPy"
     )
}

sampsheet <- sampsheet %>%
    mutate(treatment = map_chr(Description, translate_trt)) %>%
    mutate(treatment = factor(treatment))

cbind(sampsheet$sampleID, sampsheet$SampleLabel, sampsheet$Description, as.character(sampsheet$treatment))
##       [,1]      [,2]       [,3]                                                        [,4]           
##  [1,] "ATV1"    "- ATV"    "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
##  [2,] "ATV2"    "- ATV"    "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
##  [3,] "ATV3"    "- ATV"    "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
##  [4,] "ATV4"    "- ATV"    "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
##  [5,] "ATV5"    "+ ATV"    "Senescent ATV-treated cells"                               "sen_ATV"      
##  [6,] "ATV6"    "+ ATV"    "Senescent ATV-treated cells"                               "sen_ATV"      
##  [7,] "ATV7"    "+ ATV"    "Senescent ATV-treated cells"                               "sen_ATV"      
##  [8,] "ATV8"    "+ ATV"    "Senescent ATV-treated cells"                               "sen_ATV"      
##  [9,] "midas1"  "C +P 3"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [10,] "midas2"  "C +P 4"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [11,] "midas3"  "C +P 5"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [12,] "midas4"  "C +P 6"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [13,] "midas5"  "mtD -P 1" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [14,] "midas6"  "mtD -P 2" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [15,] "midas7"  "mtD -P 3" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [16,] "midas8"  "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [17,] "midas9"  "mtD -P 5" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [18,] "midas10" "mtD -P 6" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [19,] "midas11" "mtD +P 1" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [20,] "midas12" "mtD +P 2" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [21,] "midas13" "mtD +P 3" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [22,] "midas14" "mtD +P 4" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [23,] "midas15" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [24,] "midas16" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [25,] "midas17" "C -P 1"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
## [26,] "midas18" "C -P 2"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
## [27,] "midas19" "C -P 3"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
## [28,] "midas20" "C -P 4"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
## [29,] "midas21" "C -P 5"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
## [30,] "midas22" "C -P 6"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
## [31,] "midas23" "C +P 1"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [32,] "midas24" "C +P 2"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"
colnames(betas) <- sampsheet$sampleID
colnames(Mvals) <- sampsheet$sampleID 
row.names(sampsheet) <- sampsheet$sampleID

#Probe annotation
dat_fData <- read_tsv("~/bigdata/EWAStools/arrayAnnotation/EPIC.hg19.manifest.tsv")
length(rownames(betas)) == length(dat_fData$probeID)
## [1] TRUE
sum(rownames(betas) != dat_fData$probeID)
## [1] 865915
#Reorder annotation to match betas
dat_fData <- dat_fData[match(rownames(betas), dat_fData$probeID),]
sum(rownames(betas) != dat_fData$probeID)
## [1] 0
head(dat_fData$probeID)
## [1] "cg00000029" "cg00000103" "cg00000109" "cg00000155" "cg00000158" "cg00000165"
head(rownames(betas))
## [1] "cg00000029" "cg00000103" "cg00000109" "cg00000155" "cg00000158" "cg00000165"
row.names(dat_fData) <- dat_fData$probeID

eset_betas <- ExpressionSet(assayData = betas,
                phenoData = AnnotatedDataFrame(sampsheet),
                featureData = AnnotatedDataFrame(dat_fData)
                )

write_rds(eset_betas, path = "../data/formatted/eset_betas.rds")

eset_Mvals <- ExpressionSet(assayData = Mvals,
                phenoData = AnnotatedDataFrame(sampsheet),
                featureData = AnnotatedDataFrame(dat_fData)
                )

write_rds(eset_Mvals, path = "../data/formatted/eset_Mvals.rds")

2.5 QC on IDATs

The sesame pipeline above already produced cleaned data. This chunk goes backwards and examines QC of the raw data again.

Read in idats as sigsets, apply QC.

The Sigset data structure is an S4 class with 6 slots. Using lapply with readIDATpair creates a list of Sigset objects. Each list element is a Sigset for each sample. To plot or anything across samples, you’ll need to apply a function across the list.

pdat <- pData(eset_Mvals)
IDATprefixes <- searchIDATprefixes(dir.name = "../data/raw/idatFiles/")
ssets <- lapply(IDATprefixes, readIDATpair)
qc10 <- do.call(rbind, lapply(ssets, function(x) as.data.frame(sesameQC(x))))
#add sample names
sum(names(ssets) != pdat$Basename)
## [1] 0
cbind(names(ssets), pdat$Basename)
##       [,1]                  [,2]                 
##  [1,] "203833040074_R01C01" "203833040074_R01C01"
##  [2,] "203833040074_R02C01" "203833040074_R02C01"
##  [3,] "203833040074_R03C01" "203833040074_R03C01"
##  [4,] "203833040074_R04C01" "203833040074_R04C01"
##  [5,] "203833040074_R05C01" "203833040074_R05C01"
##  [6,] "203833040074_R06C01" "203833040074_R06C01"
##  [7,] "203833040074_R07C01" "203833040074_R07C01"
##  [8,] "203833040074_R08C01" "203833040074_R08C01"
##  [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203839170002_R01C01" "203839170002_R01C01"
## [26,] "203839170002_R02C01" "203839170002_R02C01"
## [27,] "203839170002_R03C01" "203839170002_R03C01"
## [28,] "203839170002_R04C01" "203839170002_R04C01"
## [29,] "203839170002_R05C01" "203839170002_R05C01"
## [30,] "203839170002_R06C01" "203839170002_R06C01"
## [31,] "203839170002_R07C01" "203839170002_R07C01"
## [32,] "203839170002_R08C01" "203839170002_R08C01"
qc10 <- qc10 %>%
    mutate(sample_name = pdat$SampleLabel) %>%
    mutate(sample_desc = pdat$Description) %>%
    dplyr::select(sample_name, sample_desc, everything())

#frac_na_cg = percentage of cg probes with NA
#mean_intensity = for type II probes. Low intensity typically bad quality
#mean_beta_cg = mean beta cg probes
#frac_meth_cg = percentage cg probes with beta > 0.7
#frac_unmeth_cg = percentage cg probes with beta < 0.3
#GCT = residual incomplete bisulfite conversion. Closer to one = more complete conversion.
qcvars <- c("sample_name", "sample_desc", "num_probes_cg", "num_na_cg", "frac_na_cg", "mean_intensity", "mean_beta_cg", "frac_meth_cg", "frac_unmeth_cg", "sex", "age", "ethnicity", "GCT")
qc10 %>%
    dplyr::select(any_of(qcvars)) %>%
    kable
sample_name sample_desc num_probes_cg num_na_cg frac_na_cg mean_intensity mean_beta_cg frac_meth_cg frac_unmeth_cg sex age ethnicity GCT
- ATV Non-senescent DMSO-treated control cells 862927 142190 16.47764 3853.198 0.4964446 38.22754 34.88263 FEMALE 4.832618 WHITE 1.129563
- ATV Non-senescent DMSO-treated control cells 862927 146181 16.94014 4293.779 0.4969831 38.37119 34.87205 FEMALE 4.971872 WHITE 1.123527
- ATV Non-senescent DMSO-treated control cells 862927 138624 16.06439 4789.837 0.5022481 38.88884 34.57434 FEMALE 4.618823 WHITE 1.125875
- ATV Non-senescent DMSO-treated control cells 862927 147279 17.06738 4654.279 0.5134016 40.95002 34.04774 FEMALE 4.610379 WHITE 1.137029
+ ATV Senescent ATV-treated cells 862927 157665 18.27095 4822.542 0.5169446 41.47125 33.69882 FEMALE 4.455712 WHITE 1.130623
+ ATV Senescent ATV-treated cells 862927 142226 16.48181 4911.161 0.5186201 41.85134 33.85565 FEMALE 4.275990 WHITE 1.132838
+ ATV Senescent ATV-treated cells 862927 146592 16.98776 5139.840 0.5152724 41.37045 34.03184 FEMALE 4.044046 WHITE 1.139674
+ ATV Senescent ATV-treated cells 862927 144026 16.69040 5281.459 0.5174685 41.79421 33.98368 FEMALE 3.729263 WHITE 1.140673
C +P 3 Non-senescent mtDNA-containing controls with pyruvate 862927 151212 17.52315 4266.443 0.5021308 39.58214 34.10059 FEMALE 4.937657 WHITE 1.139620
C +P 4 Non-senescent mtDNA-containing controls with pyruvate 862927 136238 15.78789 4265.593 0.5080209 40.41674 34.09877 FEMALE 5.257289 WHITE 1.148064
C +P 5 Non-senescent mtDNA-containing controls with pyruvate 862927 134673 15.60653 4748.199 0.5077737 41.17725 35.43667 FEMALE 3.921508 WHITE 1.146814
C +P 6 Non-senescent mtDNA-containing controls with pyruvate 862927 131662 15.25761 4991.073 0.5124335 41.60489 35.41110 FEMALE 4.067941 WHITE 1.148517
mtD -P 1 Senescent mtDNA-depleted cells no pyruvate 862927 151648 17.57368 4599.550 0.5244305 43.07367 33.03176 FEMALE 4.194471 WHITE 1.137215
mtD -P 2 Senescent mtDNA-depleted cells no pyruvate 862927 135063 15.65173 4513.403 0.5291737 43.78126 33.12624 FEMALE 4.220530 WHITE 1.146916
mtD -P 3 Senescent mtDNA-depleted cells no pyruvate 862927 136418 15.80875 5172.337 0.5196899 42.87297 35.09619 FEMALE 4.421055 WHITE 1.152611
mtD -P 4 Senescent mtDNA-depleted cells no pyruvate 862927 137301 15.91108 5378.971 0.5184395 42.65737 35.20670 FEMALE 3.933508 WHITE 1.148173
mtD -P 5 Senescent mtDNA-depleted cells no pyruvate 862927 140866 16.32421 3835.542 0.5115558 41.23613 33.90725 FEMALE 4.985296 WHITE 1.144645
mtD -P 6 Senescent mtDNA-depleted cells no pyruvate 862927 140723 16.30764 4089.288 0.5136177 41.56194 33.88461 FEMALE 3.777414 WHITE 1.144287
mtD +P 1 Non-senescent mtDNA-depleted controls cells with pyruvate 862927 128391 14.87855 4923.509 0.5097017 41.40995 35.61337 FEMALE 5.212871 WHITE 1.156965
mtD +P 2 Non-senescent mtDNA-depleted controls cells with pyruvate 862927 123732 14.33864 5422.586 0.5031447 40.20928 36.20168 FEMALE 4.542549 WHITE 1.163880
mtD +P 3 Non-senescent mtDNA-depleted controls cells with pyruvate 862927 133861 15.51244 5411.625 0.5134522 41.52793 34.09156 FEMALE 3.963279 WHITE 1.149286
mtD +P 4 Non-senescent mtDNA-depleted controls cells with pyruvate 862927 124219 14.39508 5579.510 0.5095529 40.86716 34.56318 FEMALE 4.409198 WHITE 1.152347
mtD +P 5 Non-senescent mtDNA-depleted controls cells with pyruvate 862927 125684 14.56485 5870.422 0.5174637 42.35266 35.16778 FEMALE 3.988534 WHITE 1.154209
mtD +P 6 Non-senescent mtDNA-depleted controls cells with pyruvate 862927 127452 14.76973 5692.319 0.5127907 41.76117 35.44961 FEMALE 3.433061 WHITE 1.151243
C -P 1 Non-senescent mtDNA-containing controls no pyruvate 862927 145220 16.82877 4081.942 0.5077514 40.44324 33.90966 FEMALE 4.977316 WHITE 1.142858
C -P 2 Non-senescent mtDNA-containing controls no pyruvate 862927 143572 16.63779 4457.889 0.5109200 40.89400 33.76163 FEMALE 4.583926 WHITE 1.131215
C -P 3 Non-senescent mtDNA-containing controls no pyruvate 862927 138742 16.07807 5237.084 0.5069778 41.07597 35.48624 FEMALE 4.227716 WHITE 1.141324
C -P 4 Non-senescent mtDNA-containing controls no pyruvate 862927 129921 15.05585 5424.897 0.5104271 41.43759 35.57120 FEMALE 4.422332 WHITE 1.151486
C -P 5 Non-senescent mtDNA-containing controls no pyruvate 862927 143311 16.60755 5429.236 0.5192146 42.30506 33.53525 FEMALE 3.666028 WHITE 1.141509
C -P 6 Non-senescent mtDNA-containing controls no pyruvate 862927 134557 15.59309 5365.433 0.5218914 42.72060 33.64677 FEMALE 3.943228 WHITE 1.143762
C +P 1 Non-senescent mtDNA-containing controls with pyruvate 862927 137805 15.96949 5765.728 0.5133604 42.13291 35.43114 FEMALE 3.655590 WHITE 1.147457
C +P 2 Non-senescent mtDNA-containing controls with pyruvate 862927 135729 15.72891 5567.668 0.5173618 42.70515 35.54616 FEMALE 3.922531 WHITE 1.142287

2.6 Missing analysis of masked Betas

The fraction of NAs are signs of masking due to variety of reasons including failed detection, high background, putative low quality probes etc.

Now we move back to the QCed betas generated from the first step.

First, remove failed samples and probes. How many samples are missing across all probes? How many probes are missing across all samples?

dim(eset_Mvals) #865,918 probes
## Features  Samples 
##   865918       32
e <- exprs(eset_Mvals)
pdat <- pData(eset_Mvals)
miss_sample <- apply(e, 2, function(x) sum(is.na(x))/length(x) )
miss_probe_blank <- apply(e, 1, function(x) sum(is.na(x))/length(x) )
length(miss_sample)
## [1] 32
length(miss_probe_blank)
## [1] 865918
sum(miss_sample >= 0.95) #0 samples have missing rate greater than 95% 
## [1] 0
#no samples that are essentially blanks.
#show missingness for all samples
data.frame(sampleID = names(miss_sample), treatment = pdat$treatment, miss_sample = miss_sample, stringsAsFactors = F) %>%
  kable()
sampleID treatment miss_sample
ATV1 ATV1 noSen_ATV 0.1653240
ATV2 ATV2 noSen_ATV 0.1699641
ATV3 ATV3 noSen_ATV 0.1611273
ATV4 ATV4 noSen_ATV 0.1712310
ATV5 ATV5 sen_ATV 0.1833418
ATV6 ATV6 sen_ATV 0.1653228
ATV7 ATV7 sen_ATV 0.1704168
ATV8 ATV8 sen_ATV 0.1674177
midas1 midas1 noSen_Mt_Py 0.1760120
midas2 midas2 noSen_Mt_Py 0.1584596
midas3 midas3 noSen_Mt_Py 0.1565876
midas4 midas4 noSen_Mt_Py 0.1530457
midas5 midas5 sen_noMt_noPy 0.1764867
midas6 midas6 sen_noMt_noPy 0.1570611
midas7 midas7 sen_noMt_noPy 0.1586074
midas8 midas8 sen_noMt_noPy 0.1596098
midas9 midas9 sen_noMt_noPy 0.1638769
midas10 midas10 sen_noMt_noPy 0.1636991
midas11 midas11 noSen_noMt_Py 0.1491850
midas12 midas12 noSen_noMt_Py 0.1437215
midas13 midas13 noSen_noMt_Py 0.1555979
midas14 midas14 noSen_noMt_Py 0.1443035
midas15 midas15 noSen_noMt_Py 0.1459977
midas16 midas16 noSen_noMt_Py 0.1480683
midas17 midas17 noSen_Mt_noPy 0.1689594
midas18 midas18 noSen_Mt_noPy 0.1670193
midas19 midas19 noSen_Mt_noPy 0.1613236
midas20 midas20 noSen_Mt_noPy 0.1510028
midas21 midas21 noSen_Mt_noPy 0.1666925
midas22 midas22 noSen_Mt_noPy 0.1564282
midas23 midas23 noSen_Mt_Py 0.1601918
midas24 midas24 noSen_Mt_Py 0.1577701
sum(miss_probe_blank >= 1) #119,631 completely missing probes. Many are masked by design.
## [1] 119631
sum(miss_probe_blank >= 0.95) #121,925 Remove these, then determine number of samples with missing rate>0.05.
## [1] 121925
#keep probes with < 0.95 missingness
eset_Mvals_clean <- eset_Mvals[miss_probe_blank < 0.95,]
dim(eset_Mvals_clean) #743,993 probes
## Features  Samples 
##   743993       32
eset_betas_clean <- eset_betas[miss_probe_blank < 0.95,]
dim(eset_betas_clean)
## Features  Samples 
##   743993       32
#After removing failed probes, estimate sample missing rate.
e <- exprs(eset_Mvals_clean)
miss_sample <- apply(e, 2, function(x) sum(is.na(x))/length(x) )
miss_probe <- apply(e, 1, function(x) sum(is.na(x))/length(x) )

data.frame(sampleID = names(miss_sample), treatment = pdat$treatment, miss_sample = miss_sample, stringsAsFactors = F) %>%
  kable()
sampleID treatment miss_sample
ATV1 ATV1 noSen_ATV 0.0285849
ATV2 ATV2 noSen_ATV 0.0339506
ATV3 ATV3 noSen_ATV 0.0237166
ATV4 ATV4 noSen_ATV 0.0354264
ATV5 ATV5 sen_ATV 0.0495099
ATV6 ATV6 sen_ATV 0.0285540
ATV7 ATV7 sen_ATV 0.0344788
ATV8 ATV8 sen_ATV 0.0310030
midas1 midas1 noSen_Mt_Py 0.0409843
midas2 midas2 noSen_Mt_Py 0.0205997
midas3 midas3 noSen_Mt_Py 0.0184061
midas4 midas4 noSen_Mt_Py 0.0143026
midas5 midas5 sen_noMt_noPy 0.0415300
midas6 midas6 sen_noMt_noPy 0.0189814
midas7 midas7 sen_noMt_noPy 0.0207341
midas8 midas8 sen_noMt_noPy 0.0219034
midas9 midas9 sen_noMt_noPy 0.0268739
midas10 midas10 sen_noMt_noPy 0.0266669
midas11 midas11 noSen_noMt_Py 0.0098737
midas12 midas12 noSen_noMt_Py 0.0044113
midas13 midas13 noSen_noMt_Py 0.0172475
midas14 midas14 noSen_noMt_Py 0.0048025
midas15 midas15 noSen_noMt_Py 0.0064073
midas16 midas16 noSen_noMt_Py 0.0086627
midas17 midas17 noSen_Mt_noPy 0.0327732
midas18 midas18 noSen_Mt_noPy 0.0305164
midas19 midas19 noSen_Mt_noPy 0.0238873
midas20 midas20 noSen_Mt_noPy 0.0119504
midas21 midas21 noSen_Mt_noPy 0.0301347
midas22 midas22 noSen_Mt_noPy 0.0181991
midas23 midas23 noSen_Mt_Py 0.0225714
midas24 midas24 noSen_Mt_Py 0.0197663
#After removing failed probes, estimate probe missing rate

sum(miss_probe == 0) #692,879 probes with no missings
## [1] 692879
sum(miss_probe > 0) #51,114 probes with missings in at least one sample
## [1] 51114
miss_probe_f <- cut(miss_probe, breaks = seq(0, 1, 0.1))
table(miss_probe_f)
## miss_probe_f
##   (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8] (0.8,0.9]   (0.9,1] 
##     16557      6662      5351      3599      3875      2704      2740      2910      3672      3044
data.frame(missing_interval = miss_probe_f) %>%
    filter(!is.na(missing_interval)) %>%
    ggplot(aes(missing_interval)) + 
      geom_bar()

2.7 Distributions

Summary of sample IDs and descriptions

p1 <- pData(eset_Mvals_clean)
p1 %>%
    dplyr::select(sampleID, Description) %>%
    kable()
sampleID Description
ATV1 Non-senescent DMSO-treated control cells
ATV2 Non-senescent DMSO-treated control cells
ATV3 Non-senescent DMSO-treated control cells
ATV4 Non-senescent DMSO-treated control cells
ATV5 Senescent ATV-treated cells
ATV6 Senescent ATV-treated cells
ATV7 Senescent ATV-treated cells
ATV8 Senescent ATV-treated cells
midas1 Non-senescent mtDNA-containing controls with pyruvate
midas2 Non-senescent mtDNA-containing controls with pyruvate
midas3 Non-senescent mtDNA-containing controls with pyruvate
midas4 Non-senescent mtDNA-containing controls with pyruvate
midas5 Senescent mtDNA-depleted cells no pyruvate
midas6 Senescent mtDNA-depleted cells no pyruvate
midas7 Senescent mtDNA-depleted cells no pyruvate
midas8 Senescent mtDNA-depleted cells no pyruvate
midas9 Senescent mtDNA-depleted cells no pyruvate
midas10 Senescent mtDNA-depleted cells no pyruvate
midas11 Non-senescent mtDNA-depleted controls cells with pyruvate
midas12 Non-senescent mtDNA-depleted controls cells with pyruvate
midas13 Non-senescent mtDNA-depleted controls cells with pyruvate
midas14 Non-senescent mtDNA-depleted controls cells with pyruvate
midas15 Non-senescent mtDNA-depleted controls cells with pyruvate
midas16 Non-senescent mtDNA-depleted controls cells with pyruvate
midas17 Non-senescent mtDNA-containing controls no pyruvate
midas18 Non-senescent mtDNA-containing controls no pyruvate
midas19 Non-senescent mtDNA-containing controls no pyruvate
midas20 Non-senescent mtDNA-containing controls no pyruvate
midas21 Non-senescent mtDNA-containing controls no pyruvate
midas22 Non-senescent mtDNA-containing controls no pyruvate
midas23 Non-senescent mtDNA-containing controls with pyruvate
midas24 Non-senescent mtDNA-containing controls with pyruvate

2.7.1 Boxplots of M-values

boxplot(exprs(eset_Mvals_clean), las = 2, ylab = "M-values")
abline(h = median(exprs(eset_Mvals_clean)), col = "blue")

2.7.2 Density plot All samples combined

plotDensities(eset_betas_clean, main = "Betas", legend = FALSE)

plotDensities(eset_Mvals_clean, main = "M values", legend = FALSE)

2.7.3 Density plot sample groups

p1 <- pData(eset_Mvals_clean)
p1 %>%
    dplyr::select(sampleID, SampleLabel, Description, treatment) %>%
    kable()
sampleID SampleLabel Description treatment
ATV1 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV2 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV3 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV4 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV5 + ATV Senescent ATV-treated cells sen_ATV
ATV6 + ATV Senescent ATV-treated cells sen_ATV
ATV7 + ATV Senescent ATV-treated cells sen_ATV
ATV8 + ATV Senescent ATV-treated cells sen_ATV
midas1 C +P 3 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas2 C +P 4 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas3 C +P 5 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas4 C +P 6 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas5 mtD -P 1 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas6 mtD -P 2 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas7 mtD -P 3 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas8 mtD -P 4 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas9 mtD -P 5 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas10 mtD -P 6 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas11 mtD +P 1 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas12 mtD +P 2 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas13 mtD +P 3 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas14 mtD +P 4 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas15 mtD +P 5 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas16 mtD +P 6 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas17 C -P 1 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas18 C -P 2 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas19 C -P 3 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas20 C -P 4 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas21 C -P 5 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas22 C -P 6 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas23 C +P 1 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas24 C +P 2 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "senATV" | p1$treatment == "noSen_ATV")]
plotDensities(eset_Mvals_sub, main = "M values ATV", legend = TRUE)

eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "sen_noMt_noPy"  | p1$treatment == "noSen_noMt_Py" | p1$treatment == "noSen_Mt_noPy" | p1$treatment == "noSen_Mt_Py")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS sen and controls", legend = TRUE)

eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "sen_noMt_noPy")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS sen", legend = TRUE)

eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "noSen_noMt_Py" | p1$treatment == "noSen_Mt_noPy" | p1$treatment == "noSen_Mt_Py")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS controls", legend = TRUE)

eset_Mvals_sub <- eset_Mvals_clean[,which(p1$treatment == "noSen_noMt_Py")]
plotDensities(eset_Mvals_sub, main = "M values MiDAS controls no MT Py", legend = TRUE)

2.8 PCA

plot_MDS <- function(mylabel, myeset){
    group <- pData(myeset)[[mylabel]]
    col.group <- group
    levels(col.group) <- brewer.pal(nlevels(col.group), "Dark2")
    col.group.ch <- as.character(col.group)
    plotMDS(myeset, col = col.group.ch, gene.selection = "common", main = paste(mylabel, "labels"), pch = 16)
    legend("topleft", fill = levels(col.group), legend = levels(group))
}

plot_MDS(mylabel = "treatment", myeset = eset_betas_clean)

plot_MDS(mylabel = "treatment", myeset = eset_Mvals_clean)

2.9 PCA ATV samples

plot_MDS <- function(mylabel, myeset){
    group <- pData(myeset)[[mylabel]]
    col.group <- group
    levels(col.group) <- brewer.pal(nlevels(col.group), "Set1")
    col.group.ch <- as.character(col.group)
    plotMDS(myeset, col = col.group.ch, gene.selection = "common", main = paste(mylabel, "labels"), pch = 16)
    legend("top", fill = levels(col.group), legend = levels(group))
}

keep <- str_detect(pData(eset_Mvals_clean)$sampleID, "ATV")
eset_sub <- eset_Mvals_clean[, keep]
plot_MDS(mylabel = "treatment", myeset = eset_sub)

2.10 Identify PCA outliers

myMDS <- plotMDS(eset_Mvals_clean, gene.selection = "common", plot = FALSE)
cluster_top <- myMDS$y[myMDS$y > 0.2 ]
cluster_bottom <- myMDS$x[myMDS$x > 1 ]
cluster_ATV <- myMDS$x[myMDS$x < (-2) ]

#cluster top
pData(eset_Mvals_clean) %>%
    filter(sampleID %in% names(cluster_top)) %>%
    dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
    kable
sampleID SampleLabel treatment Description
midas1 C +P 3 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas2 C +P 4 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas5 mtD -P 1 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas6 mtD -P 2 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas9 mtD -P 5 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas10 mtD -P 6 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas13 mtD +P 3 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas14 mtD +P 4 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas17 C -P 1 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas18 C -P 2 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas21 C -P 5 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas22 C -P 6 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
#cluster bottom
pData(eset_Mvals_clean) %>%
    filter(sampleID %in% names(cluster_bottom)) %>%
    dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
    kable
sampleID SampleLabel treatment Description
midas3 C +P 5 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas4 C +P 6 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas7 mtD -P 3 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas8 mtD -P 4 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas11 mtD +P 1 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas12 mtD +P 2 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas15 mtD +P 5 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas16 mtD +P 6 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas19 C -P 3 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas20 C -P 4 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas23 C +P 1 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas24 C +P 2 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
#cluster ATV
pData(eset_Mvals_clean) %>%
    filter(sampleID %in% names(cluster_ATV)) %>%
    dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
    kable
sampleID SampleLabel treatment Description
ATV1 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV2 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV3 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV4 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV5 + ATV sen_ATV Senescent ATV-treated cells
ATV6 + ATV sen_ATV Senescent ATV-treated cells
ATV7 + ATV sen_ATV Senescent ATV-treated cells
ATV8 + ATV sen_ATV Senescent ATV-treated cells

2.11 Density plots of PCA clusters

plotDensities(eset_Mvals_clean, main = "M values all samples", legend = FALSE)

p1 <- pData(eset_Mvals_clean)
p1 %>%
    dplyr::select(sampleID, SampleLabel, Description, treatment) %>%
    kable()
sampleID SampleLabel Description treatment
ATV1 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV2 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV3 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV4 - ATV Non-senescent DMSO-treated control cells noSen_ATV
ATV5 + ATV Senescent ATV-treated cells sen_ATV
ATV6 + ATV Senescent ATV-treated cells sen_ATV
ATV7 + ATV Senescent ATV-treated cells sen_ATV
ATV8 + ATV Senescent ATV-treated cells sen_ATV
midas1 C +P 3 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas2 C +P 4 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas3 C +P 5 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas4 C +P 6 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas5 mtD -P 1 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas6 mtD -P 2 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas7 mtD -P 3 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas8 mtD -P 4 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas9 mtD -P 5 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas10 mtD -P 6 Senescent mtDNA-depleted cells no pyruvate sen_noMt_noPy
midas11 mtD +P 1 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas12 mtD +P 2 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas13 mtD +P 3 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas14 mtD +P 4 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas15 mtD +P 5 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas16 mtD +P 6 Non-senescent mtDNA-depleted controls cells with pyruvate noSen_noMt_Py
midas17 C -P 1 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas18 C -P 2 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas19 C -P 3 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas20 C -P 4 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas21 C -P 5 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas22 C -P 6 Non-senescent mtDNA-containing controls no pyruvate noSen_Mt_noPy
midas23 C +P 1 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
midas24 C +P 2 Non-senescent mtDNA-containing controls with pyruvate noSen_Mt_Py
eset_Mvals_sub <- eset_Mvals_clean[, p1$sampleID %in% names(cluster_top)]
plotDensities(eset_Mvals_sub, main = "M values cluster top", legend = TRUE)

eset_Mvals_sub <- eset_Mvals_clean[, p1$sampleID %in% names(cluster_bottom)]
plotDensities(eset_Mvals_sub, main = "M values cluster bottom", legend = TRUE)

eset_Mvals_sub <- eset_Mvals_clean[, p1$sampleID %in% names(cluster_ATV)]
plotDensities(eset_Mvals_sub, main = "M values cluster ATV", legend = TRUE)

2.12 Probe-wise analysis of senescence treatment

2.12.1 Model parameterization

Group means parameterization. No intercept.

\[ Y = B_1X_1 + B_2X_2 + B_3X_3 + B_4X_4 + B_5X_5 + B_6X_6 + \epsilon \]

\(B_1\) = mean expression level in noSen_ATV.

\(B_2\) = mean expression level in noSen_Mt_Py.

\(B_3\) = mean expression level in noSen_Mt_noPy.

\(B_4\) = mean expression level in noSen_noMt_Py.

\(B_5\) = mean expression level in sen_ATV.

\(B_6\) = mean expression level in sen_noMt_noPy.

After model fit, contrasts are set that represent difference in group means.

logFC = estimate of the log2-fold-change

CI.L = lower 95% confidence interval for logFC

CI.R = upper 95% confidence interval for logFC

AveExpr = average log2-expression for the probe over all samples

t = moderated t-statistic

P.value = unadjusted P-value

adj.P.value = BH 1995 adjusted P-value

B = log-odds that the variable is differentially abundant.

From the limma user guide: Suppose for example that B = 1.5. The odds of differential expression is exp(1.5)=4.48, i.e, about four and a half to one. The probability that the gene is differentially expressed is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this gene is differentially expressed. A B-statistic of zero corresponds to a 50-50 chance that the gene is differentially expressed. The B-statistic is automatically adjusted for multiple testing by assuming that 1% of the genes are expected to be differentially expressed. The p-values and B-statistics will normally rank genes in the same order.

design <- model.matrix(~0 + treatment, data = pData(eset_Mvals_clean))
design
##         treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## ATV1                     1                    0                      0                      0                0                      0
## ATV2                     1                    0                      0                      0                0                      0
## ATV3                     1                    0                      0                      0                0                      0
## ATV4                     1                    0                      0                      0                0                      0
## ATV5                     0                    0                      0                      0                1                      0
## ATV6                     0                    0                      0                      0                1                      0
## ATV7                     0                    0                      0                      0                1                      0
## ATV8                     0                    0                      0                      0                1                      0
## midas1                   0                    1                      0                      0                0                      0
## midas2                   0                    1                      0                      0                0                      0
## midas3                   0                    1                      0                      0                0                      0
## midas4                   0                    1                      0                      0                0                      0
## midas5                   0                    0                      0                      0                0                      1
## midas6                   0                    0                      0                      0                0                      1
## midas7                   0                    0                      0                      0                0                      1
## midas8                   0                    0                      0                      0                0                      1
## midas9                   0                    0                      0                      0                0                      1
## midas10                  0                    0                      0                      0                0                      1
## midas11                  0                    0                      0                      1                0                      0
## midas12                  0                    0                      0                      1                0                      0
## midas13                  0                    0                      0                      1                0                      0
## midas14                  0                    0                      0                      1                0                      0
## midas15                  0                    0                      0                      1                0                      0
## midas16                  0                    0                      0                      1                0                      0
## midas17                  0                    0                      1                      0                0                      0
## midas18                  0                    0                      1                      0                0                      0
## midas19                  0                    0                      1                      0                0                      0
## midas20                  0                    0                      1                      0                0                      0
## midas21                  0                    0                      1                      0                0                      0
## midas22                  0                    0                      1                      0                0                      0
## midas23                  0                    1                      0                      0                0                      0
## midas24                  0                    1                      0                      0                0                      0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$treatment
## [1] "contr.treatment"
colSums(design)
##     treatmentnoSen_ATV   treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py       treatmentsen_ATV treatmentsen_noMt_noPy 
##                      4                      6                      6                      6                      4                      6
cm <- makeContrasts(ATVvControl = treatmentsen_ATV - treatmentnoSen_ATV,
            SenMidasvNoSenMtnoPy = treatmentsen_noMt_noPy - treatmentnoSen_Mt_noPy,
            SenMidasvNoSenNoMtPy = treatmentsen_noMt_noPy - treatmentnoSen_noMt_Py,
            levels = design
            )
fit <- lmFit(eset_Mvals_clean, design)
## Warning: Partial NA coefficients for 19911 probe(s)
fit2 <- contrasts.fit(fit, contrasts = cm)
fit2 <- eBayes(fit2)

results <- decideTests(fit2)
summary(results)
##        ATVvControl SenMidasvNoSenMtnoPy SenMidasvNoSenNoMtPy
## Down             1                    0                25109
## NotSig      725639               734633               705171
## Up               1                    0                 5949
vennDiagram(results)

volcanoplot(fit2, coef = "ATVvControl" , main = "ATV vs control")

volcanoplot(fit2, coef = "SenMidasvNoSenMtnoPy", main = "MiDAS vs control with MtDNA")

volcanoplot(fit2, coef = "SenMidasvNoSenNoMtPy", main = "MiDAS vs control without mtDNA")

annot_cols <- c("CpG_chrm", "CpG_beg", "probeID", "gene")
topTable(fit2, coef = "ATVvControl", genelist = fit2$genes[,annot_cols], p.value = 0.05, confint = TRUE) %>%
    kable()
CpG_chrm CpG_beg probeID gene logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg05454501 chr5 76373390 cg05454501 ZBED3 -0.7813445 -1.239024 -0.3236651 -5.109639 -7.087593 1e-07 0.0253733 3.682465
cg25892587 chr10 3666168 cg25892587 NA 4.4394857 4.179659 4.6993119 -1.980614 15.395070 0e+00 0.0020826 -1.136984
topTable(fit2, coef = "SenMidasvNoSenMtnoPy", genelist = fit2$genes[,annot_cols], p.value = 0.05, confint = TRUE) %>%
    kable()
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames = list(: The table should have a header (column names)

|| || || ||

topTable(fit2, coef = "SenMidasvNoSenNoMtPy", genelist = fit2$genes[,annot_cols], p.value = 0.05, confint = TRUE) %>%
    kable()
CpG_chrm CpG_beg probeID gene logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg00675940 chr5 42923549 cg00675940 NA -1.4071519 -1.5738946 -1.2404091 1.3693795 -8.245859 0 0.0013562 10.494661
cg21545248 chr5 149402497 cg21545248 HMGXB3;RPS20P4 -0.7937433 -1.1107432 -0.4767434 0.5253161 -8.205613 0 0.0013562 10.406157
cg21189727 chr8 67940949 cg21189727 PPP1R42 -0.7814580 -1.0559005 -0.5070156 -3.9713074 -7.872901 0 0.0021366 9.664540
cg02780400 chr10 43187468 cg02780400 LINC01518 -0.6788858 -0.8948734 -0.4628981 0.9014428 -7.648428 0 0.0025989 9.154314
cg26591221 chr16 88942334 cg26591221 CBFA2T3 -0.9453924 -1.1290296 -0.7617553 4.2901905 -7.590883 0 0.0025989 9.022258
cg00329052 chr4 169935000 cg00329052 NA -1.4317712 -1.7298707 -1.1336717 1.4300259 -7.471123 0 0.0025989 8.745811
cg09107055 chr16 46854829 cg09107055 C16orf87 -1.0151181 -1.4286880 -0.6015482 -0.3137258 -7.451222 0 0.0025989 8.699662
cg00402311 chr8 62282814 cg00402311 CLVS1 -0.8368593 -1.0099842 -0.6637343 0.1290056 -7.426037 0 0.0025989 8.641177
cg03338185 chr1 16304820 cg03338185 NA -1.1278547 -1.3348646 -0.9208447 1.0472024 -7.325484 0 0.0026091 8.406727
cg04398932 chr12 133656737 cg04398932 ZNF140 0.8484672 0.6160453 1.0808890 -2.8908636 7.398473 0 0.0026091 8.389052
#save results 
resultAll <- topTable(fit2, coef = "ATVvControl", sort.by = "none", number = nrow(fit2), confint = TRUE)
write_csv(resultAll, path = paste0(dir_out, "ATVvControl.csv"))
## Warning: The `path` argument of `write_csv()` is deprecated as of readr 1.4.0.
## Please use the `file` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
resultAll <- topTable(fit2, coef = "SenMidasvNoSenMtnoPy", sort.by = "none", number = nrow(fit2), confint = TRUE)
write_csv(resultAll, path = paste0(dir_out, "SenMidasvNoSenMtnoPy.csv"))
resultAll <- topTable(fit2, coef = "SenMidasvNoSenNoMtPy", sort.by = "none", number = nrow(fit2), confint = TRUE)
write_csv(resultAll, path = paste0(dir_out, "SenMidasvNoSenNoMtPy.csv"))

2.12.2 Compare old ATV results with sesame results

Array annotation downloaded here.

Old results are my quick t-tests performed on Nate’s Noob-normalized data using minfi. I saw lots of significant ATV associations in old results, but not from the sesame-based results. What are the differences?

There are many differences between old and new ATV results. Differences include:

  1. Preprocessing. Old results from minfi noob preprocessing. New results from sesame noob and pOOBAH.

  2. Methylation values. Old results calculated from Betas. New results calculated from M-values.

  3. Association testing. Old results calculated from t-test. New results calculated from moderated t-tests using limma. Limma should have more power, and should be more robust because variance won’t be influenced by probes with high variance.

Results below show there are many differences between old results and new Sesame with limma.

fdat <- read_tsv("~/bigdata/EWAStools/arrayAnnotation/EPIC.hg19.manifest.tsv")
dat <- read_csv(paste0(dir_out, "ATVvControl.csv"))
datOld <- read_csv(paste0(dir_out, "probewise_ATV.csv"))

datOld %>%
    filter(P_BH <= 0.05) %>%
    left_join(fdat, by = "probeID") %>%
    summarize(n_masked = sum(MASK_general), n_not_masked = sum(!MASK_general))
## # A tibble: 1 x 2
##   n_masked n_not_masked
##      <int>        <int>
## 1       13          167
names(datOld)[2:4] <- paste("v1", names(datOld)[2:4], sep = "_")
datOld %>%
    filter(v1_P_BH <= 0.05) %>%
    left_join(dat, by = "probeID") %>%
    select(probeID, v1_T_STAT, v1_P, v1_P_BH, t, P.Value, adj.P.Val, everything()) %>%
    arrange(v1_P) %>%
    slice_head(n = 10)
## # A tibble: 10 x 68
##    probeID v1_T_STAT    v1_P v1_P_BH        t P.Value adj.P.Val CpG_chrm CpG_beg CpG_end probe_strand address_A address_B channel designType nextBase nextBaseRef probeType orientation probeCpGcnt context35 probeBeg
##    <chr>       <dbl>   <dbl>   <dbl>    <dbl>   <dbl>     <dbl> <chr>      <dbl>   <dbl> <chr>            <dbl>     <dbl> <chr>   <chr>      <chr>    <chr>       <chr>     <chr>             <dbl>     <dbl>    <dbl>
##  1 cg1625~      37.7 2.35e-8  0.0166  0.207     0.838      1.00 chr6      4.71e7  4.71e7 +             31663185        NA Both    II         G/A      C           cg        up                    1         1   4.71e7
##  2 cg1352~      36.8 4.74e-8  0.0166  0.00730   0.994      1.00 chr12     6.50e7  6.50e7 -             97785272        NA Both    II         G/A      C           cg        down                  1         2   6.50e7
##  3 cg0258~      35.1 5.75e-8  0.0166  0.130     0.898      1.00 chr13     3.42e7  3.42e7 +             22661284        NA Both    II         G/A      C           cg        up                    1         1   3.42e7
##  4 cg1602~      28.7 1.38e-7  0.0226  0.486     0.632      1.00 chr2      4.37e7  4.37e7 -             36620188        NA Both    II         G/A      C           cg        down                  1         2   4.37e7
##  5 cg1397~      36.9 1.50e-7  0.0226  0.00566   0.996      1.00 chr1      1.63e8  1.63e8 +             80673585        NA Both    II         G/A      C           cg        up                    1         1   1.63e8
##  6 cg1679~      26.5 1.96e-7  0.0226 -0.0488    0.961      1.00 chr10     7.19e7  7.19e7 -             55810162        NA Both    II         G/A      C           cg        down                  1         2   7.19e7
##  7 cg2306~      26.8 2.06e-7  0.0226 NA        NA         NA    <NA>     NA      NA      <NA>                NA        NA <NA>    <NA>       <NA>     <NA>        <NA>      <NA>                 NA        NA  NA     
##  8 cg1669~      30.8 2.12e-7  0.0226  0.222     0.825      1.00 chr3      7.11e7  7.11e7 -             73768831        NA Both    II         G/A      C           cg        down                  0         1   7.11e7
##  9 cg1783~      36.7 2.41e-7  0.0226 -0.0778    0.938      1.00 chr17     1.98e7  1.98e7 +             16806277        NA Both    II         G/A      C           cg        up                    1         1   1.98e7
## 10 cg1469~      33.4 2.62e-7  0.0226 -0.204     0.839      1.00 chr12     9.25e7  9.25e7 +             88600417        NA Both    II         G/A      C           cg        up                    1         1   9.25e7
## # ... with 46 more variables: probeEnd <dbl>, ProbeSeq_A <chr>, ProbeSeq_B <chr>, gene <chr>, gene_HGNC <chr>, chrm_A <chr>, beg_A <dbl>, flag_A <dbl>, mapQ_A <dbl>, cigar_A <chr>, NM_A <dbl>, chrm_B <chr>, beg_B <dbl>,
## #   flag_B <dbl>, mapQ_B <dbl>, cigar_B <chr>, NM_B <dbl>, wDecoy_chrm_A <chr>, wDecoy_beg_A <dbl>, wDecoy_flag_A <dbl>, wDecoy_mapQ_A <dbl>, wDecoy_cigar_A <chr>, wDecoy_NM_A <dbl>, wDecoy_chrm_B <chr>,
## #   wDecoy_beg_B <dbl>, wDecoy_flag_B <dbl>, wDecoy_mapQ_B <dbl>, wDecoy_cigar_B <chr>, wDecoy_NM_B <dbl>, posMatch <lgl>, MASK_mapping <lgl>, MASK_typeINextBaseSwitch <lgl>, MASK_rmsk15 <lgl>, MASK_sub40_copy <lgl>,
## #   MASK_sub35_copy <lgl>, MASK_sub30_copy <lgl>, MASK_sub25_copy <lgl>, MASK_snp5_common <lgl>, MASK_snp5_GMAF1p <lgl>, MASK_extBase <lgl>, MASK_general <lgl>, logFC <dbl>, CI.L <dbl>, CI.R <dbl>, AveExpr <dbl>,
## #   B <dbl>
datMerge <- inner_join(datOld, dat, by = "probeID") 
cor1 <- cor(datMerge$v1_T_STAT, datMerge$t, use = "complete.obs")
if(cor1 < 0.1) cor1 <- 0.1

datOld %>%
    inner_join(dat, by = "probeID") %>%
    filter(!is.na(v1_T_STAT)) %>%
    filter(!is.na(t)) %>%
    ggplot(aes(x = v1_T_STAT, y = t)) +
    geom_hex(binwidth = c(cor1, cor1)) +
    labs(x = "old results (minfi noob t-test)", 
         y = "new results",
         caption = "Old results are from Nate minfi noob data with Dan t-test."
         )

# Minfi processing

Since the sesame-based results are so different from the noob-normalized minfi results, let me try minfi noob myself, with some additional QC steps. Besides, the minfi noob results had many significant associations.

2.13 Replicate old results from Nate’s data

This code chunk creates a noob processed dataset from the raw data. Should be the same as data from Nate.

Output is a matrix of betas. Then, apply unpaired t-test and compare to my previous results. Results should replicate, as long as my minfi noob preprocess was the same as Nate’s.

Summary methods: Minfi noob Beta values Unpaired t-test

If I can replicate results, then I start adding different layers to see if significant associations can be retained while still performing robust QC.

baseDir <- "../data/raw/idatFiles/"
targets <- read.csv(paste0(baseDir, "SampleSheet.csv"))
targets$BasenameOriginal <- as.character(targets$Basename)
targets$Slide <- substr(targets$Basename, 1, 12)
targets$Basename <- paste0(baseDir, targets$Slide, "/", targets$BasenameOriginal)

rgSet <- read.metharray.exp(base = NULL, targets = targets, recursive = TRUE, force = TRUE)

#must load manifest to perform preprocessNoob
Mset <- preprocessNoob(rgSet)
gset <- mapToGenome(Mset)
grset <- ratioConvert(gset, what = "both")
beta_n <- getBeta(grset)
ewas <- data.frame(ID = row.names(beta_n), beta_n)
names(ewas) <- str_replace_all(names(ewas), pattern = "^X", replacement = "")
names(ewas)[1] <- "probeID"

#Targets and sample IDs in same order
sum(names(ewas)[-1] != targets$BasenameOriginal)
## [1] 0
cbind(names(ewas)[-1], targets$BasenameOriginal)
##       [,1]                  [,2]                 
##  [1,] "203839170002_R01C01" "203839170002_R01C01"
##  [2,] "203839170002_R02C01" "203839170002_R02C01"
##  [3,] "203839170002_R03C01" "203839170002_R03C01"
##  [4,] "203839170002_R04C01" "203839170002_R04C01"
##  [5,] "203839170002_R05C01" "203839170002_R05C01"
##  [6,] "203839170002_R06C01" "203839170002_R06C01"
##  [7,] "203839170002_R07C01" "203839170002_R07C01"
##  [8,] "203839170002_R08C01" "203839170002_R08C01"
##  [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203833040074_R01C01" "203833040074_R01C01"
## [26,] "203833040074_R02C01" "203833040074_R02C01"
## [27,] "203833040074_R03C01" "203833040074_R03C01"
## [28,] "203833040074_R04C01" "203833040074_R04C01"
## [29,] "203833040074_R05C01" "203833040074_R05C01"
## [30,] "203833040074_R06C01" "203833040074_R06C01"
## [31,] "203833040074_R07C01" "203833040074_R07C01"
## [32,] "203833040074_R08C01" "203833040074_R08C01"
#create new variables in sample sheet to indicate contrasts
samp <- targets %>%
    mutate(ATV = ifelse(str_detect(ExternalSampleID, "\\+ ATV"), 1L, 0L))
samp$ATV[str_detect(samp$ExternalSampleID, ".*ATV.*", negate = TRUE)] <- NA
cbind(samp$ATV, samp$ExternalSampleID)
##       [,1] [,2]      
##  [1,] NA   "C -P 1"  
##  [2,] NA   "C +P 1"  
##  [3,] NA   "mtD -P 1"
##  [4,] NA   "mtD +P 1"
##  [5,] NA   "C -P 2"  
##  [6,] NA   "C +P 2"  
##  [7,] NA   "mtD -P 2"
##  [8,] NA   "mtD +P 2"
##  [9,] NA   "C -P 3"  
## [10,] NA   "C +P 3"  
## [11,] NA   "mtD -P 3"
## [12,] NA   "mtD +P 3"
## [13,] NA   "C -P 4"  
## [14,] NA   "C +P 4"  
## [15,] NA   "mtD -P 4"
## [16,] NA   "mtD +P 4"
## [17,] NA   "C -P 5"  
## [18,] NA   "C +P 5"  
## [19,] NA   "mtD -P 5"
## [20,] NA   "mtD +P 5"
## [21,] NA   "C -P 6"  
## [22,] NA   "C +P 6"  
## [23,] NA   "mtD -P 6"
## [24,] NA   "mtD +P 6"
## [25,] "0"  " - ATV 1"
## [26,] "1"  " + ATV 1"
## [27,] "0"  " - ATV 2"
## [28,] "1"  " + ATV 2"
## [29,] "0"  " - ATV 3"
## [30,] "1"  " + ATV 3"
## [31,] "0"  " - ATV 4"
## [32,] "1"  " + ATV 4"
betaMat <- ewas %>%
    dplyr::select(-probeID) %>%
    as.matrix

# Association testing
lm_fun <- function(cg){
  out <- tryCatch({
    lm1 <- t.test(cg ~ ATV, data = samp, na.action = na.omit)
    tstat <- lm1$statistic
    p <- lm1$p.value
    return(c(tstat, p))
  
  }, error = function(e) rep(NA, 2)
          )
}


results <- apply(betaMat, 1, lm_fun)
results <- t(results)
colnames(results) <- c("T_STAT", "P")
resultsTB <- as_tibble(results)
resultsTB <- resultsTB %>%
    mutate(probeID = row.names(results)) %>%
    dplyr::select(probeID, everything())
adjp1 <- mt.rawp2adjp(resultsTB[["P"]], proc = "BH")
resultsTB <- resultsTB %>%
    mutate(P_BH = adjp1$adjp[order(adjp1$index), "BH"])

sum(resultsTB$P_BH <= 0.05)
## [1] 180
#Are new results the same as the old results from data from Nate?
datOld <- read_csv("../results/probewise_ATV.csv")
sum(datOld$P_BH <= 0.05)
## [1] 180
names(datOld)[2:4] <- paste("v1", names(datOld)[2:4], sep = "_")
datOld %>%
    filter(v1_P_BH <= 0.05) %>%
    left_join(resultsTB, by = "probeID") %>%
    arrange(v1_P) %>%
        slice_head(n = 10) %>%
    kable(digits = 9)
probeID v1_T_STAT v1_P v1_P_BH T_STAT P P_BH
cg16256574 37.73952 2.40e-08 0.01660825 37.73952 2.40e-08 0.01660825
cg13529355 36.82122 4.70e-08 0.01660825 36.82122 4.70e-08 0.01660825
cg02586306 35.07179 5.80e-08 0.01660825 35.07179 5.80e-08 0.01660825
cg16029913 28.74487 1.38e-07 0.02264537 28.74487 1.38e-07 0.02264537
cg13979492 36.92012 1.50e-07 0.02264537 36.92012 1.50e-07 0.02264537
cg16794634 26.54316 1.96e-07 0.02264537 26.54316 1.96e-07 0.02264537
cg23068081 26.84196 2.06e-07 0.02264537 26.84196 2.06e-07 0.02264537
cg16697896 30.79678 2.12e-07 0.02264537 30.79678 2.12e-07 0.02264537
cg17834252 36.70854 2.41e-07 0.02264537 36.70854 2.41e-07 0.02264537
cg14690837 33.36396 2.62e-07 0.02264537 33.36396 2.62e-07 0.02264537
datOld %>%
    inner_join(resultsTB, by = "probeID") %>%
    filter(!is.na(v1_T_STAT)) %>%
    filter(!is.na(T_STAT)) %>%
    ggplot(aes(x = v1_T_STAT, y = T_STAT)) +
    geom_bin2d(binwidth = c(0.8, 0.8)) +
    labs(x = "old results (Nate minfi noob t-test)", 
         y = "new results (Dan minfi noob t-test)",
         caption = "Old results are from Nate minfi noob data with Dan t-test. New results are from Dan minfi noob data with Dan t-test."
         )

3 Minfi noob normalization and limma

Now that I can replicate my old results, let’s see if simply adding limma on top removes all of the significant findings.

Summary methods: Minfi noob Beta values Limma

baseDir <- "../data/raw/idatFiles/"
targets <- read.csv(paste0(baseDir, "SampleSheet.csv"))
targets$BasenameOriginal <- as.character(targets$Basename)
targets$Slide <- substr(targets$Basename, 1, 12)
targets$Basename <- paste0(baseDir, targets$Slide, "/", targets$BasenameOriginal)

rgSet <- read.metharray.exp(base = NULL, targets = targets, recursive = TRUE, force = TRUE)

#must load manifest to perform preprocessNoob
Mset <- preprocessNoob(rgSet)
gset <- mapToGenome(Mset)
grset <- ratioConvert(gset, what = "both")
bVals <- getBeta(grset)

#Targets and sample IDs in same order
sum(names(bVals) != targets$BasenameOriginal)
## [1] 0
cbind(names(bVals), targets$BasenameOriginal)
##       [,1]                 
##  [1,] "203839170002_R01C01"
##  [2,] "203839170002_R02C01"
##  [3,] "203839170002_R03C01"
##  [4,] "203839170002_R04C01"
##  [5,] "203839170002_R05C01"
##  [6,] "203839170002_R06C01"
##  [7,] "203839170002_R07C01"
##  [8,] "203839170002_R08C01"
##  [9,] "203836210101_R01C01"
## [10,] "203836210101_R02C01"
## [11,] "203836210101_R03C01"
## [12,] "203836210101_R04C01"
## [13,] "203836210101_R05C01"
## [14,] "203836210101_R06C01"
## [15,] "203836210101_R07C01"
## [16,] "203836210101_R08C01"
## [17,] "203839170001_R01C01"
## [18,] "203839170001_R02C01"
## [19,] "203839170001_R03C01"
## [20,] "203839170001_R04C01"
## [21,] "203839170001_R05C01"
## [22,] "203839170001_R06C01"
## [23,] "203839170001_R07C01"
## [24,] "203839170001_R08C01"
## [25,] "203833040074_R01C01"
## [26,] "203833040074_R02C01"
## [27,] "203833040074_R03C01"
## [28,] "203833040074_R04C01"
## [29,] "203833040074_R05C01"
## [30,] "203833040074_R06C01"
## [31,] "203833040074_R07C01"
## [32,] "203833040074_R08C01"
#create new variables in sample sheet to indicate contrasts
translate_trt <- function(ch){
  switch(ch,
     "Non-senescent DMSO-treated control cells" = "noSen_ATV",
     "Non-senescent mtDNA-containing controls no pyruvate" = "noSen_Mt_noPy", 
     "Non-senescent mtDNA-containing controls with pyruvate" = "noSen_Mt_Py",
     "Non-senescent mtDNA-depleted controls cells with pyruvate" = "noSen_noMt_Py",
     "Senescent ATV-treated cells" = "sen_ATV",
     "Senescent mtDNA-depleted cells no pyruvate" = "sen_noMt_noPy"
     )
}

targets <- targets %>%
    mutate(treatment = map_chr(Description, translate_trt)) %>%
    mutate(treatment = factor(treatment))

design <- model.matrix(~0 + treatment, data = targets)
design
##    treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## 1                   0                    0                      1                      0                0                      0
## 2                   0                    0                      1                      0                0                      0
## 3                   0                    0                      1                      0                0                      0
## 4                   0                    0                      1                      0                0                      0
## 5                   0                    0                      1                      0                0                      0
## 6                   0                    0                      1                      0                0                      0
## 7                   0                    1                      0                      0                0                      0
## 8                   0                    1                      0                      0                0                      0
## 9                   0                    1                      0                      0                0                      0
## 10                  0                    1                      0                      0                0                      0
## 11                  0                    1                      0                      0                0                      0
## 12                  0                    1                      0                      0                0                      0
## 13                  0                    0                      0                      0                0                      1
## 14                  0                    0                      0                      0                0                      1
## 15                  0                    0                      0                      0                0                      1
## 16                  0                    0                      0                      0                0                      1
## 17                  0                    0                      0                      0                0                      1
## 18                  0                    0                      0                      0                0                      1
## 19                  0                    0                      0                      1                0                      0
## 20                  0                    0                      0                      1                0                      0
## 21                  0                    0                      0                      1                0                      0
## 22                  0                    0                      0                      1                0                      0
## 23                  0                    0                      0                      1                0                      0
## 24                  0                    0                      0                      1                0                      0
## 25                  1                    0                      0                      0                0                      0
## 26                  1                    0                      0                      0                0                      0
## 27                  1                    0                      0                      0                0                      0
## 28                  1                    0                      0                      0                0                      0
## 29                  0                    0                      0                      0                1                      0
## 30                  0                    0                      0                      0                1                      0
## 31                  0                    0                      0                      0                1                      0
## 32                  0                    0                      0                      0                1                      0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$treatment
## [1] "contr.treatment"
colSums(design)
##     treatmentnoSen_ATV   treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py       treatmentsen_ATV treatmentsen_noMt_noPy 
##                      4                      6                      6                      6                      4                      6
cm <- makeContrasts(ATVvControl = treatmentsen_ATV - treatmentnoSen_ATV,
            SenMidasvNoSenMtnoPy = treatmentsen_noMt_noPy - treatmentnoSen_Mt_noPy,
            SenMidasvNoSenNoMtPy = treatmentsen_noMt_noPy - treatmentnoSen_noMt_Py,
            levels = design
            )
fit <- lmFit(bVals, design)
fit2 <- contrasts.fit(fit, contrasts = cm)
fit2 <- eBayes(fit2)

results <- decideTests(fit2)
summary(results)
##        ATVvControl SenMidasvNoSenMtnoPy SenMidasvNoSenNoMtPy
## Down             9                    0                22857
## NotSig      865848               865859               838077
## Up               2                    0                 4925
vennDiagram(results)

volcanoplot(fit2, coef = "ATVvControl" , main = "ATV vs control")

volcanoplot(fit2, coef = "SenMidasvNoSenMtnoPy", main = "MiDAS vs control with MtDNA")

volcanoplot(fit2, coef = "SenMidasvNoSenNoMtPy", main = "MiDAS vs control without mtDNA")

dat <- topTable(fit2, coef = "ATVvControl", n = Inf)
dat <- dat %>%
    mutate(probeID = row.names(dat))
topTable(fit2, coef = "ATVvControl", confint = TRUE) %>%
    kable()
logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg00645473 0.0535270 0.0405843 0.0664696 0.8966459 8.481569 0e+00 0.0034060 10.763888
cg17652435 -0.0460689 -0.0577523 -0.0343855 0.0896966 -8.086617 0e+00 0.0043736 9.812713
cg05454501 -0.0229190 -0.0293537 -0.0164842 0.0354585 -7.304490 1e-07 0.0200366 7.870240
cg01465169 -0.0378045 -0.0488819 -0.0267272 0.0560658 -6.999000 2e-07 0.0275752 7.091534
cg01971137 -0.0290129 -0.0375412 -0.0204846 0.0341193 -6.976785 2e-07 0.0275752 7.034500
cg26953727 -0.0429698 -0.0557349 -0.0302048 0.0761805 -6.903504 2e-07 0.0277149 6.845977
cg09857158 -0.0354711 -0.0461575 -0.0247847 0.0499359 -6.807232 2e-07 0.0304142 6.597437
cg22621867 -0.0302225 -0.0394808 -0.0209642 0.0765190 -6.694638 3e-07 0.0317472 6.305544
cg18820209 -0.0315769 -0.0412587 -0.0218951 0.0684725 -6.688671 3e-07 0.0317472 6.290039
cg05238741 -0.0769146 -0.1006260 -0.0532031 0.0498812 -6.652392 4e-07 0.0317472 6.195697
topTable(fit2, coef = "SenMidasvNoSenMtnoPy", confint = TRUE) %>%
    kable()
logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg21720999 0.0361671 0.0246498 0.0476845 0.8903936 6.440044 6.0e-07 0.2082651 5.468084
cg04247553 0.0554357 0.0377487 0.0731227 0.7007547 6.427798 7.0e-07 0.2082651 5.435857
cg22844780 0.1084488 0.0732490 0.1436485 0.2997607 6.318473 9.0e-07 0.2082651 5.147576
cg12520327 -0.0556408 -0.0738038 -0.0374778 0.1186453 -6.282491 1.0e-06 0.2082651 5.052478
cg04408387 0.4124997 0.2741891 0.5508104 0.5476185 6.116385 1.5e-06 0.2491065 4.612160
cg12770471 0.0499735 0.0328667 0.0670804 0.6999067 5.990967 2.1e-06 0.2491065 4.278385
cg01014399 -0.1691188 -0.2270361 -0.1112015 0.3953825 -5.988395 2.1e-06 0.2491065 4.271530
cg20416110 -0.0157394 -0.0211912 -0.0102877 0.0417993 -5.920801 2.5e-06 0.2491065 4.091209
cg00221525 -0.0686183 -0.0924379 -0.0447988 0.1503046 -5.907908 2.6e-06 0.2491065 4.056781
cg08080985 -0.0402609 -0.0546679 -0.0258539 0.0853104 -5.731093 4.1e-06 0.3420635 3.583737
topTable(fit2, coef = "SenMidasvNoSenNoMtPy", confint = TRUE) %>%
    kable()
logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg26591221 -0.0256581 -0.0319110 -0.0194051 0.9650291 -8.415214 0 0.0021684 10.453092
cg22513166 -0.0969465 -0.1212837 -0.0726094 0.2972196 -8.169375 0 0.0021684 9.858456
cg09107055 -0.1496877 -0.1876376 -0.1117379 0.3679817 -8.089158 0 0.0021684 9.662731
cg00675940 -0.1656357 -0.2076895 -0.1235818 0.6466739 -8.077467 0 0.0021684 9.634140
cg21189727 -0.0378747 -0.0476890 -0.0280604 0.0760784 -7.914374 0 0.0021684 9.233425
cg21069829 -0.1070371 -0.1347872 -0.0792870 0.6530591 -7.910367 0 0.0021684 9.223537
cg00399115 -0.3699645 -0.4664988 -0.2734303 0.1286240 -7.859691 0 0.0021684 9.098313
cg10138791 -0.2355833 -0.2984960 -0.1726706 0.5639089 -7.679504 0 0.0028219 8.650461
cg15313859 -0.1539862 -0.1956021 -0.1123703 0.3733631 -7.588378 0 0.0028219 8.422442
cg05025392 -0.1497952 -0.1904469 -0.1091435 0.5391017 -7.556947 0 0.0028219 8.343556
#Compare old and new ATV results
datOld <- read_csv(paste0(dir_out, "probewise_ATV.csv"))
names(datOld)[2:4] <- paste("v1", names(datOld)[2:4], sep = "_")
datOld %>%
    filter(v1_P_BH <= 0.05) %>%
    left_join(dat, by = "probeID") %>%
    dplyr::select(probeID, v1_T_STAT, v1_P, v1_P_BH, t, P.Value, adj.P.Val, everything()) %>%
    arrange(v1_P) %>%
    slice_head(n = 10)
## # A tibble: 10 x 10
##    probeID    v1_T_STAT         v1_P v1_P_BH        t P.Value adj.P.Val     logFC AveExpr     B
##    <chr>          <dbl>        <dbl>   <dbl>    <dbl>   <dbl>     <dbl>     <dbl>   <dbl> <dbl>
##  1 cg16256574      37.7 0.0000000235  0.0166  0.132     0.896      1.00  0.0186    0.332  -7.33
##  2 cg13529355      36.8 0.0000000474  0.0166 -0.0752    0.941      1.00 -0.00881   0.423  -7.34
##  3 cg02586306      35.1 0.0000000575  0.0166  0.165     0.870      1.00  0.00437   0.151  -7.33
##  4 cg16029913      28.7 0.000000138   0.0226 -0.0151    0.988      1.00 -0.00161   0.517  -7.34
##  5 cg13979492      36.9 0.000000150   0.0226 -0.00925   0.993      1.00 -0.000942  0.689  -7.34
##  6 cg16794634      26.5 0.000000196   0.0226 -0.204     0.840      1.00 -0.00725   0.824  -7.32
##  7 cg23068081      26.8 0.000000206   0.0226  0.0366    0.971      1.00  0.00117   0.692  -7.34
##  8 cg16697896      30.8 0.000000212   0.0226  0.122     0.904      1.00  0.0107    0.357  -7.34
##  9 cg17834252      36.7 0.000000241   0.0226 -0.0808    0.936      1.00 -0.0115    0.302  -7.34
## 10 cg14690837      33.4 0.000000262   0.0226 -0.206     0.838      1.00 -0.00806   0.0997 -7.32
datMerge <- inner_join(datOld, dat, by = "probeID") 
cor1 <- cor(datMerge$v1_T_STAT, datMerge$t, use = "complete.obs")
if(cor1 < 0.1) cor1 <- 0.1

datOld %>%
    inner_join(dat, by = "probeID") %>%
    filter(!is.na(v1_T_STAT)) %>%
    filter(!is.na(t)) %>%
    ggplot(aes(x = v1_T_STAT, y = t)) +
    geom_hex(binwidth = c(cor1, cor1)) +
    labs(x = "old results (minfi noob t-test)", 
         y = "new results (minfi noob, beta values, limma)",
         caption = "Old results are from Nate minfi noob data with Dan t-test. New results are from minfi noob generating beta values that are analyzed with limma"
         )

4 Minfi QC, noob, Mvals and limma for association testing

Minfi noob normalized data with unpaired t-test replicates the old results. However, adding limma made the results totally different. What do we trust? Limma is more trustworthy, especially for small sample sizes. The test-statistics for the vanilla t-tests were way too high (39!) to be believed, which is most likely due to unstable variance estimates. Limma takes care of this, so we should move forward with limma.

Limma analysis of beta values still retained significant findings for ATV contrast. Now, what about using M-values with basic minfi noob and limma?

So, this will be the official analysis using minfi processing with additional QC, using Mvalues and limma for association testing.

Create an RGChannelSet object.

Detection P-values. Set to missing. Remove bad samples and probes.

Small detection p-values = good signals.

Detection P-value > 0.05 = bad signal. Set to missing.

4.1 Minfi data import

baseDir <- "../data/raw/idatFiles/"
targets <- read.csv(paste0(baseDir, "SampleSheet.csv"))
targets$BasenameOriginal <- as.character(targets$Basename)
targets$Slide <- substr(targets$Basename, 1, 12)
targets$Basename <- paste0(baseDir, targets$Slide, "/", targets$BasenameOriginal)

rgSet <- read.metharray.exp(base = NULL, targets = targets, recursive = TRUE, force = TRUE)

#Add nicer sample IDs
sum(colnames(rgSet) != targets$BasenameOriginal)
## [1] 0
cbind(colnames(rgSet), targets$BasenameOriginal)
##       [,1]                  [,2]                 
##  [1,] "203839170002_R01C01" "203839170002_R01C01"
##  [2,] "203839170002_R02C01" "203839170002_R02C01"
##  [3,] "203839170002_R03C01" "203839170002_R03C01"
##  [4,] "203839170002_R04C01" "203839170002_R04C01"
##  [5,] "203839170002_R05C01" "203839170002_R05C01"
##  [6,] "203839170002_R06C01" "203839170002_R06C01"
##  [7,] "203839170002_R07C01" "203839170002_R07C01"
##  [8,] "203839170002_R08C01" "203839170002_R08C01"
##  [9,] "203836210101_R01C01" "203836210101_R01C01"
## [10,] "203836210101_R02C01" "203836210101_R02C01"
## [11,] "203836210101_R03C01" "203836210101_R03C01"
## [12,] "203836210101_R04C01" "203836210101_R04C01"
## [13,] "203836210101_R05C01" "203836210101_R05C01"
## [14,] "203836210101_R06C01" "203836210101_R06C01"
## [15,] "203836210101_R07C01" "203836210101_R07C01"
## [16,] "203836210101_R08C01" "203836210101_R08C01"
## [17,] "203839170001_R01C01" "203839170001_R01C01"
## [18,] "203839170001_R02C01" "203839170001_R02C01"
## [19,] "203839170001_R03C01" "203839170001_R03C01"
## [20,] "203839170001_R04C01" "203839170001_R04C01"
## [21,] "203839170001_R05C01" "203839170001_R05C01"
## [22,] "203839170001_R06C01" "203839170001_R06C01"
## [23,] "203839170001_R07C01" "203839170001_R07C01"
## [24,] "203839170001_R08C01" "203839170001_R08C01"
## [25,] "203833040074_R01C01" "203833040074_R01C01"
## [26,] "203833040074_R02C01" "203833040074_R02C01"
## [27,] "203833040074_R03C01" "203833040074_R03C01"
## [28,] "203833040074_R04C01" "203833040074_R04C01"
## [29,] "203833040074_R05C01" "203833040074_R05C01"
## [30,] "203833040074_R06C01" "203833040074_R06C01"
## [31,] "203833040074_R07C01" "203833040074_R07C01"
## [32,] "203833040074_R08C01" "203833040074_R08C01"
cbind(targets$ExternalSampleID, targets$Description)
##       [,1]       [,2]                                                       
##  [1,] "C -P 1"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [2,] "C +P 1"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [3,] "mtD -P 1" "Non-senescent mtDNA-containing controls no pyruvate"      
##  [4,] "mtD +P 1" "Non-senescent mtDNA-containing controls no pyruvate"      
##  [5,] "C -P 2"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [6,] "C +P 2"   "Non-senescent mtDNA-containing controls no pyruvate"      
##  [7,] "mtD -P 2" "Non-senescent mtDNA-containing controls with pyruvate"    
##  [8,] "mtD +P 2" "Non-senescent mtDNA-containing controls with pyruvate"    
##  [9,] "C -P 3"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [10,] "C +P 3"   "Non-senescent mtDNA-containing controls with pyruvate"    
## [11,] "mtD -P 3" "Non-senescent mtDNA-containing controls with pyruvate"    
## [12,] "mtD +P 3" "Non-senescent mtDNA-containing controls with pyruvate"    
## [13,] "C -P 4"   "Senescent mtDNA-depleted cells no pyruvate"               
## [14,] "C +P 4"   "Senescent mtDNA-depleted cells no pyruvate"               
## [15,] "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"               
## [16,] "mtD +P 4" "Senescent mtDNA-depleted cells no pyruvate"               
## [17,] "C -P 5"   "Senescent mtDNA-depleted cells no pyruvate"               
## [18,] "C +P 5"   "Senescent mtDNA-depleted cells no pyruvate"               
## [19,] "mtD -P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [20,] "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [21,] "C -P 6"   "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [22,] "C +P 6"   "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [23,] "mtD -P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [24,] "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate"
## [25,] " - ATV 1" "Non-senescent DMSO-treated control cells"                 
## [26,] " + ATV 1" "Non-senescent DMSO-treated control cells"                 
## [27,] " - ATV 2" "Non-senescent DMSO-treated control cells"                 
## [28,] " + ATV 2" "Non-senescent DMSO-treated control cells"                 
## [29,] " - ATV 3" "Senescent ATV-treated cells"                              
## [30,] " + ATV 3" "Senescent ATV-treated cells"                              
## [31,] " - ATV 4" "Senescent ATV-treated cells"                              
## [32,] " + ATV 4" "Senescent ATV-treated cells"
targets$sampleID <- ""
targets$sampleID[1:24] <- paste0("midas", 1:24)
targets$sampleID[25:32] <- paste0("ATV", 1:8)

translate_trt <- function(ch){
  switch(ch,
     "Non-senescent DMSO-treated control cells" = "noSen_ATV",
     "Non-senescent mtDNA-containing controls no pyruvate" = "noSen_Mt_noPy", 
     "Non-senescent mtDNA-containing controls with pyruvate" = "noSen_Mt_Py",
     "Non-senescent mtDNA-depleted controls cells with pyruvate" = "noSen_noMt_Py",
     "Senescent ATV-treated cells" = "sen_ATV",
     "Senescent mtDNA-depleted cells no pyruvate" = "sen_noMt_noPy"
     )
}

targets <- targets %>%
    mutate(treatment = map_chr(Description, translate_trt)) %>%
    mutate(treatment = factor(treatment))

cbind(targets$sampleID, targets$ExternalSampleID, targets$Description, as.character(targets$treatment))
##       [,1]      [,2]       [,3]                                                        [,4]           
##  [1,] "midas1"  "C -P 1"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
##  [2,] "midas2"  "C +P 1"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
##  [3,] "midas3"  "mtD -P 1" "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
##  [4,] "midas4"  "mtD +P 1" "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
##  [5,] "midas5"  "C -P 2"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
##  [6,] "midas6"  "C +P 2"   "Non-senescent mtDNA-containing controls no pyruvate"       "noSen_Mt_noPy"
##  [7,] "midas7"  "mtD -P 2" "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
##  [8,] "midas8"  "mtD +P 2" "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
##  [9,] "midas9"  "C -P 3"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [10,] "midas10" "C +P 3"   "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [11,] "midas11" "mtD -P 3" "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [12,] "midas12" "mtD +P 3" "Non-senescent mtDNA-containing controls with pyruvate"     "noSen_Mt_Py"  
## [13,] "midas13" "C -P 4"   "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [14,] "midas14" "C +P 4"   "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [15,] "midas15" "mtD -P 4" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [16,] "midas16" "mtD +P 4" "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [17,] "midas17" "C -P 5"   "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [18,] "midas18" "C +P 5"   "Senescent mtDNA-depleted cells no pyruvate"                "sen_noMt_noPy"
## [19,] "midas19" "mtD -P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [20,] "midas20" "mtD +P 5" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [21,] "midas21" "C -P 6"   "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [22,] "midas22" "C +P 6"   "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [23,] "midas23" "mtD -P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [24,] "midas24" "mtD +P 6" "Non-senescent mtDNA-depleted controls cells with pyruvate" "noSen_noMt_Py"
## [25,] "ATV1"    " - ATV 1" "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
## [26,] "ATV2"    " + ATV 1" "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
## [27,] "ATV3"    " - ATV 2" "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
## [28,] "ATV4"    " + ATV 2" "Non-senescent DMSO-treated control cells"                  "noSen_ATV"    
## [29,] "ATV5"    " - ATV 3" "Senescent ATV-treated cells"                               "sen_ATV"      
## [30,] "ATV6"    " + ATV 3" "Senescent ATV-treated cells"                               "sen_ATV"      
## [31,] "ATV7"    " - ATV 4" "Senescent ATV-treated cells"                               "sen_ATV"      
## [32,] "ATV8"    " + ATV 4" "Senescent ATV-treated cells"                               "sen_ATV"
cbind(targets$sampleID, targets$ExternalSampleID, as.character(targets$treatment))
##       [,1]      [,2]       [,3]           
##  [1,] "midas1"  "C -P 1"   "noSen_Mt_noPy"
##  [2,] "midas2"  "C +P 1"   "noSen_Mt_noPy"
##  [3,] "midas3"  "mtD -P 1" "noSen_Mt_noPy"
##  [4,] "midas4"  "mtD +P 1" "noSen_Mt_noPy"
##  [5,] "midas5"  "C -P 2"   "noSen_Mt_noPy"
##  [6,] "midas6"  "C +P 2"   "noSen_Mt_noPy"
##  [7,] "midas7"  "mtD -P 2" "noSen_Mt_Py"  
##  [8,] "midas8"  "mtD +P 2" "noSen_Mt_Py"  
##  [9,] "midas9"  "C -P 3"   "noSen_Mt_Py"  
## [10,] "midas10" "C +P 3"   "noSen_Mt_Py"  
## [11,] "midas11" "mtD -P 3" "noSen_Mt_Py"  
## [12,] "midas12" "mtD +P 3" "noSen_Mt_Py"  
## [13,] "midas13" "C -P 4"   "sen_noMt_noPy"
## [14,] "midas14" "C +P 4"   "sen_noMt_noPy"
## [15,] "midas15" "mtD -P 4" "sen_noMt_noPy"
## [16,] "midas16" "mtD +P 4" "sen_noMt_noPy"
## [17,] "midas17" "C -P 5"   "sen_noMt_noPy"
## [18,] "midas18" "C +P 5"   "sen_noMt_noPy"
## [19,] "midas19" "mtD -P 5" "noSen_noMt_Py"
## [20,] "midas20" "mtD +P 5" "noSen_noMt_Py"
## [21,] "midas21" "C -P 6"   "noSen_noMt_Py"
## [22,] "midas22" "C +P 6"   "noSen_noMt_Py"
## [23,] "midas23" "mtD -P 6" "noSen_noMt_Py"
## [24,] "midas24" "mtD +P 6" "noSen_noMt_Py"
## [25,] "ATV1"    " - ATV 1" "noSen_ATV"    
## [26,] "ATV2"    " + ATV 1" "noSen_ATV"    
## [27,] "ATV3"    " - ATV 2" "noSen_ATV"    
## [28,] "ATV4"    " + ATV 2" "noSen_ATV"    
## [29,] "ATV5"    " - ATV 3" "sen_ATV"      
## [30,] "ATV6"    " + ATV 3" "sen_ATV"      
## [31,] "ATV7"    " - ATV 4" "sen_ATV"      
## [32,] "ATV8"    " + ATV 4" "sen_ATV"
sampleNames(rgSet) <- targets$sampleID

4.2 Minfi normalization and QC filtering

Set probes to missing if detP > 0.05.

Remove probes that are bad quality (SNP overlap, cross-reactive, design problems).

#Probe and sample filtering

detP <- detectionP(rgSet)

barplot(colMeans(detP), las = 2, cex.names = 0.8, ylab = "Mean detection p-values")
abline(h = 0.05, col = "red")

qcReport(rgSet, sampNames = targets$sampleID, sampGroups = as.character(targets$treatment),
     pdf = "minfiQCreport_v2.pdf")
## png 
##   2
#remove bad samples from rgSet
keep <- colMeans(detP) < 0.05
keep
##  midas1  midas2  midas3  midas4  midas5  midas6  midas7  midas8  midas9 midas10 midas11 midas12 midas13 midas14 midas15 midas16 midas17 midas18 midas19 midas20 midas21 midas22 midas23 midas24    ATV1    ATV2    ATV3 
##    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE 
##    ATV4    ATV5    ATV6    ATV7    ATV8 
##    TRUE    TRUE    TRUE    TRUE    TRUE
rgSet <- rgSet[,keep]
rgSet
## class: RGChannelSet 
## dim: 1051815 32 
## metadata(0):
## assays(2): Green Red
## rownames(1051815): 1600101 1600111 ... 99810990 99810992
## rowData names(0):
## colnames(32): midas1 midas2 ... ATV7 ATV8
## colData names(22): OriginalOrder ExternalSampleID ... Slide filenames
## Annotation
##   array: IlluminaHumanMethylationEPIC
##   annotation: ilm10b4.hg19
#remove bad samples from targets
targets <- targets[keep,]

#remove bad samples from detP matrix
detP <- detP[, keep]

#Normalization
#must load manifest to perform preprocessNoob
mSet <- preprocessNoob(rgSet)
gset <- mapToGenome(mSet)
grSet <- ratioConvert(gset, what = "both")

#After normalization, can exclude probes
# Probe exclusion based on design problems
xreact_450 <- read_xlsx("~/bigdata/EWAStools/arrayAnnotation/48639-non-specific-probes-Illumina450k.xlsx", sheet = "nonspecific cg probes")
annot_EPIC <- read_tsv("~/bigdata/EWAStools/arrayAnnotation/EPIC.hg19.manifest.tsv")
## 
## -- Column specification ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## cols(
##   .default = col_double(),
##   CpG_chrm = col_character(),
##   probe_strand = col_character(),
##   probeID = col_character(),
##   channel = col_character(),
##   designType = col_character(),
##   nextBase = col_character(),
##   nextBaseRef = col_character(),
##   probeType = col_character(),
##   orientation = col_character(),
##   ProbeSeq_A = col_character(),
##   ProbeSeq_B = col_character(),
##   gene = col_character(),
##   gene_HGNC = col_character(),
##   chrm_A = col_character(),
##   cigar_A = col_character(),
##   chrm_B = col_character(),
##   cigar_B = col_character(),
##   wDecoy_chrm_A = col_character(),
##   wDecoy_cigar_A = col_character(),
##   wDecoy_chrm_B = col_character()
##   # ... with 13 more columns
## )
## i Use `spec()` for the full column specifications.
table(annot_EPIC$MASK_general[annot_EPIC$probeID %in% xreact_450$TargetID])
## 
## FALSE  TRUE 
##  1956 25268
annot_EPIC$MASK_general[annot_EPIC$probeID %in% xreact_450$TargetID] <- TRUE
arrayProbes <- featureNames(grSet)
annot_EPIC <- annot_EPIC[match(arrayProbes, annot_EPIC$probeID), ]
sum(arrayProbes != annot_EPIC$probeID)
## [1] 0
grSet <- grSet[!annot_EPIC$MASK_general,]
dim(grSet)
## [1] 764543     32
#Probe exclusion based on detP
arrayProbes <- featureNames(grSet)
detP <- detP[match(arrayProbes, rownames(detP)), ]
# remove probes that failed in one or more samples
keep <- rowSums(detP < 0.05) == ncol(grSet)
table(keep)
## keep
##  FALSE   TRUE 
##   1932 762611
grSet <- grSet[keep,]

# Data exploration with PCA and boxplots
Mvals <- getM(grSet)
boxplot(Mvals, las = 2, ylab = "M-values")

group <- targets$treatment
col.group <- group
levels(col.group) <- brewer.pal(nlevels(col.group), "Dark2")
col.group.ch <- as.character(col.group)
plotMDS(Mvals, gene.selection = "common", col = col.group.ch, pch = 16)
legend("topleft", fill = levels(col.group), legend = levels(group))

cbind(levels(group), levels(col.group))
##      [,1]            [,2]     
## [1,] "noSen_ATV"     "#1B9E77"
## [2,] "noSen_Mt_Py"   "#D95F02"
## [3,] "noSen_Mt_noPy" "#7570B3"
## [4,] "noSen_noMt_Py" "#E7298A"
## [5,] "sen_ATV"       "#66A61E"
## [6,] "sen_noMt_noPy" "#E6AB02"
cbind(colnames(Mvals), as.character(group), col.group.ch)
##                                 col.group.ch
##  [1,] "midas1"  "noSen_Mt_noPy" "#7570B3"   
##  [2,] "midas2"  "noSen_Mt_noPy" "#7570B3"   
##  [3,] "midas3"  "noSen_Mt_noPy" "#7570B3"   
##  [4,] "midas4"  "noSen_Mt_noPy" "#7570B3"   
##  [5,] "midas5"  "noSen_Mt_noPy" "#7570B3"   
##  [6,] "midas6"  "noSen_Mt_noPy" "#7570B3"   
##  [7,] "midas7"  "noSen_Mt_Py"   "#D95F02"   
##  [8,] "midas8"  "noSen_Mt_Py"   "#D95F02"   
##  [9,] "midas9"  "noSen_Mt_Py"   "#D95F02"   
## [10,] "midas10" "noSen_Mt_Py"   "#D95F02"   
## [11,] "midas11" "noSen_Mt_Py"   "#D95F02"   
## [12,] "midas12" "noSen_Mt_Py"   "#D95F02"   
## [13,] "midas13" "sen_noMt_noPy" "#E6AB02"   
## [14,] "midas14" "sen_noMt_noPy" "#E6AB02"   
## [15,] "midas15" "sen_noMt_noPy" "#E6AB02"   
## [16,] "midas16" "sen_noMt_noPy" "#E6AB02"   
## [17,] "midas17" "sen_noMt_noPy" "#E6AB02"   
## [18,] "midas18" "sen_noMt_noPy" "#E6AB02"   
## [19,] "midas19" "noSen_noMt_Py" "#E7298A"   
## [20,] "midas20" "noSen_noMt_Py" "#E7298A"   
## [21,] "midas21" "noSen_noMt_Py" "#E7298A"   
## [22,] "midas22" "noSen_noMt_Py" "#E7298A"   
## [23,] "midas23" "noSen_noMt_Py" "#E7298A"   
## [24,] "midas24" "noSen_noMt_Py" "#E7298A"   
## [25,] "ATV1"    "noSen_ATV"     "#1B9E77"   
## [26,] "ATV2"    "noSen_ATV"     "#1B9E77"   
## [27,] "ATV3"    "noSen_ATV"     "#1B9E77"   
## [28,] "ATV4"    "noSen_ATV"     "#1B9E77"   
## [29,] "ATV5"    "sen_ATV"       "#66A61E"   
## [30,] "ATV6"    "sen_ATV"       "#66A61E"   
## [31,] "ATV7"    "sen_ATV"       "#66A61E"   
## [32,] "ATV8"    "sen_ATV"       "#66A61E"
## Identify PCA outliers
myMDS <- plotMDS(Mvals, gene.selection = "common", plot = FALSE)
cluster_top <- myMDS$y[myMDS$y > 0.2 ]
cluster_bottom <- myMDS$x[myMDS$x < (-1) ]
cluster_ATV <- myMDS$x[myMDS$x > 2 ]

#cluster top
targets %>%
    filter(sampleID %in% names(cluster_top)) %>%
    dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
    kable
sampleID SampleLabel treatment Description
midas1 C -P 1 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas2 C -P 2 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas5 C -P 5 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas6 C -P 6 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas9 C +P 3 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas10 C +P 4 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas13 mtD -P 1 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas14 mtD -P 2 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas17 mtD -P 5 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas18 mtD -P 6 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas21 mtD +P 3 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas22 mtD +P 4 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
#cluster bottom
targets %>%
    filter(sampleID %in% names(cluster_bottom)) %>%
    dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
    kable
sampleID SampleLabel treatment Description
midas3 C -P 3 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas4 C -P 4 noSen_Mt_noPy Non-senescent mtDNA-containing controls no pyruvate
midas7 C +P 1 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas8 C +P 2 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas11 C +P 5 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas12 C +P 6 noSen_Mt_Py Non-senescent mtDNA-containing controls with pyruvate
midas15 mtD -P 3 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas16 mtD -P 4 sen_noMt_noPy Senescent mtDNA-depleted cells no pyruvate
midas19 mtD +P 1 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas20 mtD +P 2 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas23 mtD +P 5 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
midas24 mtD +P 6 noSen_noMt_Py Non-senescent mtDNA-depleted controls cells with pyruvate
#cluster ATV
targets %>%
    filter(sampleID %in% names(cluster_ATV)) %>%
    dplyr::select(sampleID, SampleLabel, treatment, Description) %>%
    kable
sampleID SampleLabel treatment Description
ATV1 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV2 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV3 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV4 - ATV noSen_ATV Non-senescent DMSO-treated control cells
ATV5 + ATV sen_ATV Senescent ATV-treated cells
ATV6 + ATV sen_ATV Senescent ATV-treated cells
ATV7 + ATV sen_ATV Senescent ATV-treated cells
ATV8 + ATV sen_ATV Senescent ATV-treated cells

4.3 Mvals for association analysis

Using limma with Mvals

#Targets and sample IDs in same order
sum(colnames(Mvals) != targets$sampleID)
## [1] 0
cbind(colnames(Mvals), targets$sampleID)
##       [,1]      [,2]     
##  [1,] "midas1"  "midas1" 
##  [2,] "midas2"  "midas2" 
##  [3,] "midas3"  "midas3" 
##  [4,] "midas4"  "midas4" 
##  [5,] "midas5"  "midas5" 
##  [6,] "midas6"  "midas6" 
##  [7,] "midas7"  "midas7" 
##  [8,] "midas8"  "midas8" 
##  [9,] "midas9"  "midas9" 
## [10,] "midas10" "midas10"
## [11,] "midas11" "midas11"
## [12,] "midas12" "midas12"
## [13,] "midas13" "midas13"
## [14,] "midas14" "midas14"
## [15,] "midas15" "midas15"
## [16,] "midas16" "midas16"
## [17,] "midas17" "midas17"
## [18,] "midas18" "midas18"
## [19,] "midas19" "midas19"
## [20,] "midas20" "midas20"
## [21,] "midas21" "midas21"
## [22,] "midas22" "midas22"
## [23,] "midas23" "midas23"
## [24,] "midas24" "midas24"
## [25,] "ATV1"    "ATV1"   
## [26,] "ATV2"    "ATV2"   
## [27,] "ATV3"    "ATV3"   
## [28,] "ATV4"    "ATV4"   
## [29,] "ATV5"    "ATV5"   
## [30,] "ATV6"    "ATV6"   
## [31,] "ATV7"    "ATV7"   
## [32,] "ATV8"    "ATV8"
design <- model.matrix(~0 + treatment, data = targets)
design
##    treatmentnoSen_ATV treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py treatmentsen_ATV treatmentsen_noMt_noPy
## 1                   0                    0                      1                      0                0                      0
## 2                   0                    0                      1                      0                0                      0
## 3                   0                    0                      1                      0                0                      0
## 4                   0                    0                      1                      0                0                      0
## 5                   0                    0                      1                      0                0                      0
## 6                   0                    0                      1                      0                0                      0
## 7                   0                    1                      0                      0                0                      0
## 8                   0                    1                      0                      0                0                      0
## 9                   0                    1                      0                      0                0                      0
## 10                  0                    1                      0                      0                0                      0
## 11                  0                    1                      0                      0                0                      0
## 12                  0                    1                      0                      0                0                      0
## 13                  0                    0                      0                      0                0                      1
## 14                  0                    0                      0                      0                0                      1
## 15                  0                    0                      0                      0                0                      1
## 16                  0                    0                      0                      0                0                      1
## 17                  0                    0                      0                      0                0                      1
## 18                  0                    0                      0                      0                0                      1
## 19                  0                    0                      0                      1                0                      0
## 20                  0                    0                      0                      1                0                      0
## 21                  0                    0                      0                      1                0                      0
## 22                  0                    0                      0                      1                0                      0
## 23                  0                    0                      0                      1                0                      0
## 24                  0                    0                      0                      1                0                      0
## 25                  1                    0                      0                      0                0                      0
## 26                  1                    0                      0                      0                0                      0
## 27                  1                    0                      0                      0                0                      0
## 28                  1                    0                      0                      0                0                      0
## 29                  0                    0                      0                      0                1                      0
## 30                  0                    0                      0                      0                1                      0
## 31                  0                    0                      0                      0                1                      0
## 32                  0                    0                      0                      0                1                      0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$treatment
## [1] "contr.treatment"
colSums(design)
##     treatmentnoSen_ATV   treatmentnoSen_Mt_Py treatmentnoSen_Mt_noPy treatmentnoSen_noMt_Py       treatmentsen_ATV treatmentsen_noMt_noPy 
##                      4                      6                      6                      6                      4                      6
cm <- makeContrasts(ATVvControl = treatmentsen_ATV - treatmentnoSen_ATV,
            SenMidasvNoSenMtnoPy = treatmentsen_noMt_noPy - treatmentnoSen_Mt_noPy,
            SenMidasvNoSenNoMtPy = treatmentsen_noMt_noPy - treatmentnoSen_noMt_Py,
            levels = design
            )
fit <- lmFit(Mvals, design)
fit2 <- contrasts.fit(fit, contrasts = cm)
fit2 <- eBayes(fit2)

results <- decideTests(fit2)
summary(results)
##        ATVvControl SenMidasvNoSenMtnoPy SenMidasvNoSenNoMtPy
## Down             0                    0                46144
## NotSig      762611               762611               704604
## Up               0                    0                11863
vennDiagram(results)

volcanoplot(fit2, coef = "ATVvControl" , main = "ATV vs control")

volcanoplot(fit2, coef = "SenMidasvNoSenMtnoPy", main = "MiDAS vs control with MtDNA")

volcanoplot(fit2, coef = "SenMidasvNoSenNoMtPy", main = "MiDAS vs control without mtDNA")

topTable(fit2, coef = "ATVvControl", confint = TRUE) %>%
    kable()
logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg16201883 0.9526968 0.6725338 1.2328599 3.690292 6.942084 1.0e-07 0.0752688 1.9772962
cg05454501 -0.7314665 -0.9648308 -0.4981023 -4.810318 -6.398913 4.0e-07 0.1619798 1.4212744
cg05238741 -1.4559297 -1.9302228 -0.9816366 -4.455730 -6.266710 6.0e-07 0.1619798 1.2795059
cg17367243 -0.6020866 -0.8134781 -0.3906951 1.031197 -5.814567 2.3e-06 0.4320972 0.7760306
cg07002602 0.6153742 0.3949186 0.8358297 5.764520 5.698549 3.1e-06 0.4796175 0.6423252
cg05045343 -0.6911336 -0.9490007 -0.4332664 -4.942574 -5.471571 6.0e-06 0.6965711 0.3756587
cg17890984 -0.4563705 -0.6282654 -0.2844757 -6.148649 -5.420017 6.9e-06 0.6965711 0.3141833
cg23569120 0.9853088 0.6124049 1.3582128 5.765368 5.394130 7.4e-06 0.6965711 0.2831897
cg08749686 0.5386579 0.3326707 0.7446452 3.591951 5.338493 8.7e-06 0.6965711 0.2163056
cg23304785 -0.6944412 -0.9609108 -0.4279717 -4.133183 -5.320273 9.2e-06 0.6965711 0.1943221
topTable(fit2, coef = "SenMidasvNoSenMtnoPy", confint = TRUE) %>%
    kable()
logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg12520327 -0.7460243 -0.9934968 -0.4985519 -2.9236813 -6.154207 9.0e-07 0.4846866 -0.5323738
cg20416110 -0.5665818 -0.7618443 -0.3713193 -4.5358308 -5.923655 1.7e-06 0.4846866 -0.7022375
cg00221525 -0.7687916 -1.0382295 -0.4993537 -2.5574900 -5.825000 2.2e-06 0.4846866 -0.7767012
cg04408387 3.5065388 2.2424641 4.7706135 0.0947063 5.663070 3.5e-06 0.4846866 -0.9012067
cg08080985 -0.7171454 -0.9823526 -0.4519382 -3.4521468 -5.520367 5.2e-06 0.4846866 -1.0132445
cg08626888 0.5848446 0.3677380 0.8019511 -3.9550422 5.499377 5.5e-06 0.4846866 -1.0299045
cg13178372 -0.6007792 -0.8261185 -0.3754398 4.0665387 -5.442818 6.5e-06 0.4846866 -1.0750233
cg04455450 -0.5703644 -0.7843846 -0.3563441 -3.3200356 -5.440559 6.5e-06 0.4846866 -1.0768323
cg16979575 -0.6244025 -0.8600630 -0.3887420 -4.0162692 -5.409085 7.1e-06 0.4846866 -1.1020904
cg01613696 -0.5698075 -0.7849632 -0.3546517 -3.6111283 -5.406562 7.2e-06 0.4846866 -1.1041193
topTable(fit2, coef = "SenMidasvNoSenNoMtPy", confint = TRUE) %>%
    kable()
logFC CI.L CI.R AveExpr t P.Value adj.P.Val B
cg26591221 -1.1088454 -1.3706771 -0.8470136 4.8466724 -8.645590 0e+00 0.0007757 11.552813
cg00675940 -1.1067892 -1.3757625 -0.8378158 0.9044313 -8.400432 0e+00 0.0007757 11.013136
cg00399115 -4.0520858 -5.0640349 -3.0401367 -3.9722647 -8.174590 0e+00 0.0007757 10.507458
cg02810043 4.6869597 3.5115466 5.8623728 -2.9951128 8.140418 0e+00 0.0007757 10.430239
cg21189727 -0.7224616 -0.9174104 -0.5275129 -3.6299923 -7.565547 0e+00 0.0027958 9.103977
cg00329052 -1.2690241 -1.6145267 -0.9235215 1.0432169 -7.498330 0e+00 0.0027958 8.945626
cg09107055 -0.9070941 -1.1580408 -0.6561475 -0.8023139 -7.379330 0e+00 0.0029760 8.663656
cg19345940 -0.8044017 -1.0272980 -0.5815055 1.8514201 -7.367433 0e+00 0.0029760 8.635353
cg00864346 -0.7356627 -0.9425844 -0.5287411 2.5877843 -7.258032 0e+00 0.0035499 8.374136
cg26815147 1.0939721 0.7815802 1.4063641 -2.1266227 7.149118 1e-07 0.0039605 8.112397
resultAll <- topTable(fit2, coef = "ATVvControl", confint = TRUE, sort.by = "none", number = Inf)

write_csv(resultAll, path = "../results/limma_ATV.csv")

5 R session

sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
## 
## locale:
## [1] C
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  utils     datasets  grDevices methods   base     
## 
## other attached packages:
##  [1] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0 IlluminaHumanMethylationEPICmanifest_0.3.0          minfi_1.34.0                                        bumphunter_1.30.0                                  
##  [5] locfit_1.5-9.4                                      iterators_1.0.13                                    foreach_1.5.1                                       Biostrings_2.56.0                                  
##  [9] XVector_0.28.0                                      SummarizedExperiment_1.18.2                         DelayedArray_0.14.1                                 matrixStats_0.57.0                                 
## [13] GenomicRanges_1.40.0                                GenomeInfoDb_1.24.2                                 IRanges_2.22.2                                      S4Vectors_0.26.1                                   
## [17] RColorBrewer_1.1-2                                  limma_3.44.3                                        multtest_2.44.0                                     Biobase_2.48.0                                     
## [21] wheatmap_0.1.0                                      sesame_1.6.0                                        sesameData_1.6.0                                    ExperimentHub_1.14.2                               
## [25] AnnotationHub_2.20.2                                BiocFileCache_1.12.1                                dbplyr_1.4.4                                        BiocGenerics_0.34.0                                
## [29] knitr_1.30                                          readxl_1.3.1                                        forcats_0.5.0                                       stringr_1.4.0                                      
## [33] dplyr_1.0.2                                         purrr_0.3.4                                         readr_1.4.0                                         tidyr_1.1.2                                        
## [37] tibble_3.0.4                                        ggplot2_3.3.2                                       tidyverse_1.3.0                                     BiocStyle_2.16.1                                   
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.10              plyr_1.8.6                    splines_4.0.1                 BiocParallel_1.22.0           digest_0.6.26                 htmltools_0.5.0               magick_2.5.0                 
##   [8] fansi_0.4.1                   magrittr_1.5                  memoise_1.1.0                 annotate_1.66.0               modelr_0.1.8                  siggenes_1.62.0               askpass_1.1                  
##  [15] prettyunits_1.1.1             colorspace_1.4-1              blob_1.2.1                    rvest_0.3.6                   rappdirs_0.3.1                haven_2.3.1                   xfun_0.18                    
##  [22] hexbin_1.28.1                 crayon_1.3.4                  RCurl_1.98-1.2                jsonlite_1.7.1                genefilter_1.70.0             GEOquery_2.56.0               survival_3.2-7               
##  [29] glue_1.4.2                    gtable_0.3.0                  zlibbioc_1.34.0               Rhdf5lib_1.10.1               HDF5Array_1.16.1              scales_1.1.1                  DBI_1.1.0                    
##  [36] rngtools_1.5                  Rcpp_1.0.5                    xtable_1.8-4                  progress_1.2.2                mclust_5.4.6                  bit_4.0.4                     preprocessCore_1.50.0        
##  [43] httr_1.4.2                    ellipsis_0.3.1                farver_2.0.3                  reshape_0.8.8                 pkgconfig_2.0.3               XML_3.99-0.5                  utf8_1.1.4                   
##  [50] DNAcopy_1.62.0                labeling_0.4.2                tidyselect_1.1.0              rlang_0.4.8                   later_1.1.0.1                 AnnotationDbi_1.50.3          munsell_0.5.0                
##  [57] BiocVersion_3.11.1            cellranger_1.1.0              tools_4.0.1                   cli_2.1.0                     generics_0.0.2                RSQLite_2.2.1                 broom_0.7.2                  
##  [64] evaluate_0.14                 fastmap_1.0.1                 yaml_2.2.1                    bit64_4.0.5                   fs_1.5.0                      beanplot_1.2                  scrime_1.3.5                 
##  [71] randomForest_4.6-14           nlme_3.1-149                  doRNG_1.8.2                   mime_0.9                      nor1mix_1.3-0                 xml2_1.3.2                    biomaRt_2.44.4               
##  [78] compiler_4.0.1                rstudioapi_0.11               curl_4.3                      interactiveDisplayBase_1.26.3 reprex_0.3.0                  stringi_1.5.3                 highr_0.8                    
##  [85] ps_1.4.0                      GenomicFeatures_1.40.1        lattice_0.20-41               Matrix_1.2-18                 vctrs_0.3.4                   pillar_1.4.6                  lifecycle_0.2.0              
##  [92] BiocManager_1.30.10           data.table_1.13.2             bitops_1.0-6                  httpuv_1.5.4                  rtracklayer_1.48.0            R6_2.4.1                      bookdown_0.21                
##  [99] promises_1.1.1                codetools_0.2-16              MASS_7.3-53                   assertthat_0.2.1              rhdf5_2.32.4                  openssl_1.4.3                 withr_2.3.0                  
## [106] GenomicAlignments_1.24.0      Rsamtools_2.4.0               GenomeInfoDbData_1.2.3        hms_0.5.3                     quadprog_1.5-8                grid_4.0.1                    base64_2.0                   
## [113] rmarkdown_2.5                 DelayedMatrixStats_1.10.1     illuminaio_0.30.0             shiny_1.5.0                   lubridate_1.7.9